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(57) Abstract 

A method for selecting recombinant host cells expressing high levels of a desired protein is described. This method utilizes eukarybtic 
host cells hart)oring a DNA constnict comprising a selectable gene (preferably an amplifiable gene) and a product gene provided 3* to 
the selectable gene. The selectable gene is positioned within an intron defined by a splice donor site and a splice acceptor site and the 
selectable gene and product gene are under the transcriptional control of a single transcriptional regulatory region. The splice donor site 
is generally an efficient splice donor site and thereby regulates expression of the product gene using the transcriptional regulatory region. 
The transfected cells are cultured so as to express the gene encoding die product in a selective mediinn comprising an amplifying agent 
for sufficient time to allow amplification to occur, whereupon either the desired product is recovered or cells having multiple copies of the 
product gene are identified. 
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METHOD FOR SELECTING HIGH* EXPRESSING HOST CELLS 
BACKGR60ND OF THE INVENTION 

Field of the Invention 

This invention relates to a method of selecting for high-expressing 
5 host cells, a method of producing a protein of interest in high yields and 
a method of producing eukaryotic cells having multiple copies of a sequence 
encoding a protein of interest. 

Description of Background and Related Art 

The discovery of methods for introducing DNA into living host cells 

10 in a functional form has provided the key to understanding many ftindamental 
biological processes, and has made possible the production of important 
proteins and other molecules in commercially useful quauitities. 

Despite the general success of such gene transfer methods, several 
common problems exist that may limit the efficiency with which a gene 

15 encoding a desired protein can be introduced into and expressed in a host 
cell. One problem is knowing when the gene has been successfully 
transferred into recipient cells. A second problem is distinguishing 
between those cells that contain the gene and those that have survived the 
transfer procedures but do not contain the gene. A third problem is 

20 identifying and isolating those cells that contain the gene and that are 
expressing high levels of the protein encoded by the gene. 

In general, the known methods for introducing genes into eukaryotic 
cells tend to be highly inefficient. Of the cells in a given culture, only 
a small proportion take up and express exogenously added DNA, and an even 

25 smaller proportion stably maintain that xmA. 

Identification of those cells that have incorporated a product gene 
encoding a desired protein typically is achieved by introducing into the 
same cells auiother gene, commonly referred to as a selectable gene, that 
encodes a selectable marker. A selectable marker La a protein that is 

30 necessary for the growth or survival of a host cell under the particular 
culture conditions chosen, such as an enzyme that confers resistance to an 
antibiotic or other drug, or an enzyme that compensates for a metabolic or 
catabolic defect in the host cell. For exan^le, selectable genes commonly 
used with eukaryotic cells include the genes for aminoglycoside 

35 phosphotransferase (APH) , hygromycin phosphotransferase (hyg) , 
dihydrof olate reductase (DHFR) , thymidine kinase (tk) , neomycin, puromycin, 
glutamine synthetase, and asparagine synthetase. 

The method of identifying a host cell that has incorporated one gene 
on the basis of expression by the host cell of a second incorporated gene 

40 encoding a selectable marker is referred to as cotransf ectation (or 
cot rans feet ion) . In that method, a gene encoding a desired polypeptide and 
a selection gene typically are introduced into the host cell 
simultaneously, although they may be introduced sequentially. In the case 
of simultaneous cotransf ectation, the gene encoding the desired polypeptide 
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and the selectable gene may be present on a single DNA molecule or on 
separate DHh molecules prior to being introduced into the host cells. 
Wigler et al . , Cell . 16:777 (1979). Cells that have incorporated the gene 
encoding the desired polypeptide then are identified or isolated by 
5 culturing the cells under conditions that preferentially allow for the 
growth or survival of those cells that synthesize the selectable marker 
encoded by the selectable gene . 

The level of expression of a gene introduced into a eukaryotic host 
cell depends on multiple factors, including gene copy number « efficiency 

10 of trauiscription, messenger RNA (mRMA) processing, stability, auid 
tramslation efficiency. Accordingly, high level expression of a desired 
polypeptide typically will involve optimizing one or more of those factors. 

For exan5)le, the level of protein production may be increased by 
covalently joining the coding sequence of the gene to a "strong" promoter 

15 or enhancer that will give high levels of treuiscription . Promoters and 
enhancers are nucleotide sequences that interact specifically with proteins 
in a host cell that are involved in transcription. Kriegler, Meth. 
Enzvmol . . 185:512 (1990); Maniatis et ai.. Science . 236:1237 (1987) . 
Promoters are located upstream of the coding sequence of a gene and 

20 facilitate transcription of the gene by RNA polymerase. Among the 
eukaryotic promoters that have been identified as strong promoters for 
high-level expression are the SV40 early promoter, adenovirus major late 
promoter, mouse metallothionein-I promoter, Rous sarcoma virus long 
terminal repeat, euid human cytomegalovirus immediate early promoter (CMV) . 

25 Enhancers stimulate transcription from a linked promoter. Unlike 

promoters, enhancers are active when placed downstream from the 
transcription initiation site or at considerable distances from the 
promoter, although in practice enhancers may overlap physically and 
functionally with promoters. For example, all of the strong promoters 

30 listed above also contain strong enhamcers. Bendig, Genetic Engineering . 
7:91 (Academic Press, 1988). 

The level of protein production also may be increased by increasing 
the gene copy number in the host cell . One method for obtaining high gene 
copy number is to directly introduce into the host cell multiple copies of ~ 

35 the gene, for example, by using a large molar excess of the product gene 
relative to the selectable gene during cotransf ectation. Kaufman, Meth. 
Enzvmol . . 185:537 (1990). With this method, however, only a small 
proportion of the cotransf ected cells will contain the product gene at high 
copy number. Furthermore, because no generally applicable, convenient 

40 method exists for distinguishing such cells from the majority of cells that 
contain fewer copies of the product gene, laborious and time-consuming 
screening methods typically are required to identify the desired high- copy 
number transfectants. 

Another method for obtaining high gene copy number involves cloning 

45 the gene in a vector that is capable of replicating autonomously in the 
host cell. Examples of such vectors include mammalian expression vectors 
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derived from Epstein-Barr virus or bovine papilloma virus, and yeast 2- 
micron plaamid vectors- Stephens & Hentschel, Biochem. J. , 248:1 (1987); 
Yates et al . , Nature . 313:612 (1985); Beggs, Genetic Engineering. 2:175 
(Academic Press, 1981) . 
5 Yet another method for obtaining high gene copy number involves gene 

amplification in the host cell . Gene amplification occurs naturally in 
eukaryotic cells at a relatively low frequency, Schimke, J. Biol. Chem.. 
263:5989 (1988). However, gene amplification also may be induced, or at 
least selected for, by exposing host cells to appropriate selective 

10 pressure. For example, in many cases it is possible to introduce a product 
gene together with an amplif iable gene into a host cell and subsequently 
select for amplification of the marker gene by exposing the cotransf ected 
cells to sequentially increasing concentrations of a selective agent. 
Typically the product gene will be coamplif ied with the marker gene under 

15 such conditions. 

The most widely used araplifiable gene for that purpose is a DHFR 
gene, which encodes a dihydrofolate reductase enzyme. The selection agent 
used in conjvmction with a DHFR gene is methotrexate (Mtx) . A host cell 
is cotransfected with a product gene encoding a desired protein and a DHFR 

20 gene, and transfectants are identified by first culturing the cells in 
culture medium that contains Mtx. A suitable host cell when a wild- type 
DHFR gene is used is the Chinese Hamster Ovary (CHO) cell line deficient 
in DHFR activity, prepared and propagated as described by Urla\ab & Chasin, 
Proc. Nat. Acad. Sci. USA . 77:4216 (1980). The transf ected cells then are 

25 exposed to successively higher amounts of Mtx. This leads to the synthesis 
of multiple copies of the DHFR gene, and concomitantly, multiple copies of 
the product gene. Schimke, J. Biol. Chem, . 263:5969 (1988); Axel et al., 
U.S. Patent No. 4,399,216; Axel et al . , U.S. Patent No. 4,634,665. Other 
references directed to co- trans feet ion of a gene together with a genetic 

30 marker that allows for selection and sxibsequent amplification include 
Kaufman in Genetic Engineering , ed. J. Setlow (Plenum Press, New York) , 
Vol. 9 (1987); Kaufman and Sharp, J. Mol. Biol. . 159:601 (1982); Ringold 
et al., J. Mol. AppI. Genet. . 1:165-175 (1981); Kaufman et aJ., Mol. Cell 
Biol . . 5:1750-1759 (1985); Kaetzel and Nilson, J. Biol. Chem. . 263:6244- 

35 6251 (1988); Hung et al., Proc. Natl. Acad. Sci. USA . 83:261-264 (1986); 
Kaufman et al., EMBO J. . 6:87-93 (1987); Johnston and Kucey, Science . 
242:1551-1554 (1988); Urlaub et al.. Cell . 33:405-412 (1983). 

TO extend the DHFR amplif icat ion method to other cell types, a mutant 
DHFR gene that encodes a protein with reduced sensitivity to methotrexate 

40 may be used in conj'unction with host cells that contain normal numbers of 
an endogenous wild- type DHFR gene. Simonsen and Levinson, Proc. Natl. 
Acad. Sci. USA . 80:2495 (1983); Wigler et al . , Proc. Natl. Acad. Sci. USA . 
77:3567-3570 (1980); Haber and Schirake, Somatic Cell Genetics . 8:499-508 
(1982) . 

45 Alternatively, host cells may be co- transf ected with the product 

gene, a DHFR gene, cind a dominant selectable gene, such as a neo' gene. Kim 
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and Wold« Cell . 42:129 (1985); Capon eC ai., U.S. Pat. No. 4,965«199. 
Transfectants are identified by £irst culturing the cells in culture medium 
containing neomycin (or the related drug G418), and the transfectants so 
identified then are selected for amplification of the DHFR gene and the 
5 product gene by exposure to successively increasing amounts of Mtx. 

As will be appreciated from this discussion, the selection of 
recombinant host cells that express high levels of a desired protein 
generally is a multi-step process. In the first step, initial 
transfectants are selected that have incorporated the product gene and the 

10 selectable gene. In subsequent steps, the initial transfectants are 
siibject to further selection for high-level expression of the selectable 
gene and then random screening for high-level expression of the product 
gene. To identify cells expressing high levels of the desired protein, 
typically one must screen large numbers of transfectants. The majority of 

15 transfectants produce less thcui maximal levels of the desired protein. 
Further, Mtx resistance in DHFR transf ormants is at least partially 
conferred by varying degrees of gene an^lif ication. Schirake, Cell . 37:705- 
713 (1984) . The inadequacies of co-expression of the non-selected gene 
have been reported by Wold et al.. Proc. Natl. Acad. Sci. PSA . 75:5684-5688 

20 (1979) . Instadsility of the ati^lified DNA is reported by Kaufman auid 
Schimke, Mol. Cell Biol. . 1:1069-1076 (1981); Haber and Schimke, Cell . 
26:355-362 (1981); and Fedespiel et al., J. Biol. Chem. . 259:9127-9140 
(1984) . 

Several methods have been described for directly selecting such 

25 recombinant host cells in a single step. One strategy involves co- 
transfecting host cells with a product gene and a DHFR gene, and selecting 
those cells that express high levels of DHFR by directly culturing in 
medium containing a high concentration of Mtx, Many of the cells selected 
in that manner also express the co-transfected product gene at high levels. 

30 Page and Sydenham, Bio/Technoloav . 9:64 (1991) . This method for single-step 
selection suffers from certain drawbacks that limit its usefulness. High- 
expressing cells obtained by direct culturing in medium containing a high 
level of a selection agent may have poor growth and stability 
characteristics, thus limiting their usefulness for long-term production 

35 processes. Page and Snyderroan, Bio/Technoloav . 9:64 (1991). Single-step 
selection for high-level resistsmce to Mtx may produce cells with an 
altered, Mtx-resistant DHFR enzyme, or cells that have altered Mtx 
transport properties, rather than cells containing amplified genes. Haber 
et aJ., J. Biol. Chem. . 256:9501 (1981); Assaraf and Schimke, Proc . Natl . 

40 Acad. Sci. PSA . 84:7154 (1987). 

Another method involves the use of polycistronic mRNA expression 
vectors containing a product gene at the 5' end of the transcribed region 
and a selectable gene at the 3' end. Because translation of the selectaQ^le 
gene at the 3' end of the polycistronic mRNA is inefficient, such vectors 

45 exhibit preferential translation of the product gene and require high 
levels of polycistronic mRNA to survive selection. Kaufman, Meth. 
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Enzvmol . . 185:487 (1990); Kaufman, Meth. Enzvmol. . 185:537 (1990); Kaufman 
et aJ., EMBO J. . 6:187 (1987). Accordingly, cells expressing high levels 
of the desired protein product may be obtained in a single step by 
culturing the initial transf ectants in medium containing a selection agent 
5 appropriate for use with the particular selectable gene. However, the 
utility of these vectors is variable because of the unpredictable influence 
of the upstream product reading frame on selectable marker translation and 
because the upstream reading frame sometimes becomes deleted during 
methotrexate amplification (Kaufman et al., J. Mol. Biol, , 159:601-621 

10 [1982]; Ijevinson, Methods in Enzvmolocrv . San Diego: Academic Press, Inc. 
[1990] ) . Later vectors incorporated an internal translation initiation site 
derived from members of the picomavirus family which is positioned between 
the product gene and the selectable gene (Pelletier et al.. Nature, 334:320 
[1988]; Jang et al., J. Virol. . 63:1651 [1989]). 

15 A third method for single-step selection involves use of a DHA 

construct with a selectable gene containing an intron within which is 
located a gene encoding the protein of interest. See U.S. Patent No. 
5,043,270 and Abrams et al., J. Biol. Chem. , 264(24): 14016-14021 (1989) . 
In yet another single-step selection method, host cells are co- transf ected 

20 with an intron-modif ied selectable gene and a gene encoding the protein of 
interest. See WO 92/17566, published October 15, 1992. The intron- 
modified gene is prepared by inserting into the transcribed region of a 
selectable gene an intron of such length that the intron is correctly 
spliced from the corresponding mRNA precursor at low efficiency, so that 

25 the amount of selectable marker produced from the intron-modif ied 
selectable gene is substantially less than that produced from the starting 
selectable gene. These vectors help to insure the integrity of the 
integrated DNA construct, but transcriptional linkage is not achieved as 
selectable gene and the protein gene are driven by separate promoters . 

30 Other mammalian expression vectors that have single transcription 

units have been descril^ed. Retroviral vectors have been constructed (Cepko 
et al., Cell , 37:1053-1062 [1984]) in which a cDNA is inserted between the 
endogenous Moloney murine leiikemia virus (M-MuLV) splice donor and splice 
acceptor sites which are followed by a neomycin resistance gene. This 

35 vector has been used to express a variety of gene products following 
retroviral infection of several cell types. 

With the eibove drawbacks in mind, it is one object of the present 
invention to increase the level of homogeneity with regard to expression 
levels of stable clones transf ected with a product gene of interest, by 

40 expressing a selectable marker (DHFR) and the protein of interest from a 
single promoter. 

It is another, object to provide a method for selecting stable, 
recombinant host cells that express high levels of a desired protein 
product, which method is rapid and convenient to perform, and reduces the 
45 numbers of transf ected cells which need to be screened. Furthermore, it is 
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an object to allow high levels of single and two iinit polypeptides to be 
rapidly generated from clones or pools of stable host cell transfectants. 

It is an additional object to provide expression vectors which bias 
for active integration events (i.e. have an increased tendency to generate 
5 transformants wherein the DNA construct is inserted into a region of the 
genome of the host cell which results in high level expression of the 
product gene) and Ccui accommodate a variety of product genes without the 
need for modification. 

10 SUMMARY OF THE INVENTION 

Accordingly, the present invention is directed to a DNA construct 
(DNA molecule) alternative terminology comprising a 5' transcriptional 
initiation site and a 3' transcriptional termination site, a selectable 
gene (preferably an amplifiable gene) and a product gene provided 3' to the 

15 selecteQjle gene, a transcriptional regulatory region regulating 
transcription of both the selectable gene and the product gene, the 
selecteUble gene positioned within an intron defined by a splice donor site 
and a splice acceptor site. The splice donor site preferably comprises an 
effective splice donor sequence as herein defined and thereby regulates 

20 e3(pression of the product gene using the transcriptional regulatory region . 

In another embodiment, the invention provides a method for producing 
a product of interest comprising culturing a eu]caryotic cell which has been 
trans fee ted with the DNA construct described above, so as to express the 
product gene and recovering the product . 

25 In a further embodiment, the invention provides a method for 

producing eukaryotic cells having multiple copies of the product gene 
con^rising trans feet ing eukaryotic cells with the DNA construct described 
above (where the selectable gene is an amplifiable gene) , growing the cells 
in a selective medium comprising an an^lifying agent for a sufficient time 

30 for amplification to occur, and selecting cells having multiple copies of 
the product gene. Preferably transfection of the cells is achieved using 
elect roporat ion . 

After transfection of the host cells, most of the transfect suits fail 
to exhibit the selectable phenotype characteristic of the protein encoded 

35 by the selectable gene, but surprisingly a small proportion of the 
transfectants do exhibit the selectable phenotype, and among those 
transfectants, the majority are found to express high levels of the desired 
product encoded by the product gene. Thus, the invention provides an 
improved method for the selection of recombinant host cells expressing high 

40 levels of a desired product, which method is useful with a wide variety of 
eukaryotic host cells and avoids the problems inherent in existing cell 
selection technology. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figures 1A*1D illustrate schematically various DNA constructs 
encompassed by the instant invention. The large arrows represent the 
selectable gene and the product gene, the V formed by the dashed lines 
5 shows the region o£ the precursor RNA internal to the 5' splice donor site 
(SD) and 3' splice acceptor site (SA) that is excised from vectors that 
contain a functional SD. The transcriptional regulatory region, selectable 
gene, product gene and trauiscriptional termination site are depicted in 
Figure lA. Figure IB depicts the DNA constructs of Example 1. The various 
10 splice donor sequences are depicted, i.e., wild type ras splice donor 
sequence (WT ras), mutant ras splice donor sequence (MUTANT ras) and non- 
functional splice donor sequence (aGT) . The probes used for Northern blot 
analysis in Example 1 are shown in Figure IB. Figure IC depicts the DNA 
constructs of Exaniple 2 and Figure ID depicts the DNA construct of Example 
15 3 used for expression of anti-IgE V^. 

Figure 2 depicts schematically the control DNA construct used in 
Exan^le 1. 

Figures 3A-Q depict the nucleotide sequence (SEQ ID NO: 1) of the 
DHFR/intron- (WTras SD) -tPA expression vector of Example 1. 

20 Figure 4 is a bar graph which shows the number of colonies that form 

in selective medium after electroporation of linearized duplicate miniprep 
DNA's prepared in parallel from the three vectors shown in Figure IB (i.e. 
with wild type ras splice donor sequence [WT ras] , mutant ras splice donor 
sequence (MUTANT ras] and non- functional splice donor sequence (aGT] ) and 

25 from the control vector that has DHFR under control of SV40 promoter and 
tPA \mder control of CMV promoter (see Figure 2) . Cells were selected in 
nucleoside free medium and coxmted with an automated colony counter. 

Figures 5A-C are bar graphs depicting expression of tPA from stable 
pools and clones generated from the vectors shown in Figure IB. In Figure 

30 SA greater than 100 clones from each vector transfection were mixed, plated 
in 24 well plates, and assayed by tPA ELISA at "saturation". In Figure SB, 
twenty clones chosen at random derived from each of the vectors were 
assayed by tPA ELISA at "saturation*'. In Figure SC, the pools mentioned in 
Figure 5A (except the aGT pool) were exposed to 200nM Mtx to select for 

35 DHFR amplification and then pooled and assayed for tPA expression. 

Figures 6A-P depict the nucleotide sequence (SEQ ID NO: 2) of the 
DHFR/intron- (WT ras SD) -TNFr-IgG expression vector of Example 2. 

Figures 7A-B are bar graphs depicting expression of TNFr-IgG using 
dicistronic or control vectors (see Example 2) . Vectors containing TNFr- 

40 IgG (but otherwise identical to those described for tPA expression in 
Example 1) were constructed (see Figure IC) , introduced into dpl2.CH0 cells 
by electroporation, pooled, and assayed for product expression before 
(Figure 7A) and after (Figure 7B) being subjected to amplification in 200nM 
Mtx. 

45 Figure 8 depicts schematically the DNA construct used for expression 

of the of anti-IgE in Exan^le 3. 
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Figures 9A-0 depict the nucleotide sequence (SEQ ID NO: 3) of the 
anti-IgE V„ expression vector of Example 3. 

Figures lOA-Q depict the nucleotide sequence (SEQ ID NO: 4) of the 
anti-IgE V,, expression vector of Example 3. 
5 Figure 11 is a bar graph depicting anti-IgE expression in Exan^le 3. 

Heavy (Vy) and light (Vi,) chain expression vectors were constructed, co- 
electroporated into CHO cells, clones were selected and assayed for 
antibody expression. Additionally, pools were established and assessed 
with regard to expression before and after Mtx selection at 200nM and 1^. 

10 DESatlPTION OF THE PREFERP gn gimnnTipwrc 

Definitions ; 

The **DMA construct*" disclosed herein comprises a non-naturally 
occurring DMA molecule which can either be provided as an isolate or 
integrated in another DNA molecule e.g. in an expression vector or the 

15 chromosome of an eukaryotic host cell. 

The term "selectable gene" as used herein refers to a DNA that 
encodes a selectable marker necessary for the gro%^h or survival of a host 
cell under the particular cell culture conditions chosen. Accordingly, a 
host cell that is transformed with a selectable gene will be capable of 

20 growth or survival \mder certain cell culture conditions wherein a non- 
transfected host cell is not capable of growth or survival. Typically, a 
selectable gene will confer resistance to a drug or compensate for a 
metabolic or catabolic defect in the host cell. Examples of selectable 
genes are provided in the following table. See also Kaufman, Methods in 

25 Enzvmolocrv . 185: 537-566 (1990), for a review of these. 



TABLE 1 

Selectable GeneF t-h eir Selection Agents 



^^^^^^^^e lection Agent 


Selectable Gene 


Methotrexate 


Dihydrof olate reductase 


Cadmium 


Metal lothionein 


PALA 


CAD 


Xyl-A-or adenosine and 2* - 
de oxycof ormyc in 


Adenosine deaminase 


Adenine, azaserine, and 
cof ormycin 


Adenylate deaminase 


6-Azauridine, pyrazofuran 


DMP Synthetase 


Mycophenolic acid 


IMP 5 ' -dehydrogenase 
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Mycophenolic acid with 
limiting xanthine 


Xanthine -guanine 
phosphoribosyltransferase 


Hypoxanthine , aminopterin, 
and thymidine (HAT) 


Mutemt HGPRTase or mutant 
thymidine kinase 


5 - Fluorodeoxyur idine 


Thymidylate synthetase 


Multiple drugs e.g. 
adriamycin, vincristine or 
colchicine 


P-glycoprotein 170 


Aphidicolin 


Ribonucleotide reductase 


Methionine sulf oximine 


Glutamine synthetase 


/9-Aspartyl hydroxamate or 
Albizziin 


Asparagine synthetase 


Canavaziine 


Arginosuccinate synthetase 


a-Di£luoromethylomi thine 


Ornithine decarboxylase 


Compact in 


HMG-CoA reductase 


Tunicamycin 


N-Acetylglucosaminyl 
transferase 


Borrelidin 


Threonyl-tRNA synthetase 


Ouabain 


Na*K*-ATPase 



The preferred selectable gene is an annplif iable gene. As used herein, 

20 the term "amplif iable gene** refers to a gene which is amplified (i.e. 
additional copies of the gene are generated which survive in 
intrachromosomal or extrachromosomal form) \mder certain conditions. The 
amplif iable gene usually encodes an enzyme (i.e. an amplif iable marker) 
which is required for growth of eukaryotic cells under those . conditions . 

25 For example, the gene may encode DHFR which is amplified when a host cell 
transformed therewith is grown in Mtx. According to Kaufman, the selecte±)le 
genes in Table 1 cd>ove can also be considered amplif iable genes. An example 
of a selectable gene which is generally not considered to be an amplif iable 
gene is the neomycin resistance gene (Cepko et al, , supra) . 

30 As used herein, •* selective medium" refers to nutrient solution used 

for growing etakaryotic cells which have the selectable gene and therefore 
includes a "selection agent**. Commercially available media such as Ham's 
FIO (Sigma) , Minimal Essential Medium ( [MEM] , Sigma) , RPMl-1640 (Sigma) , 
and Dulbecco's Modified Eagle's Medium ( [DMEM] , Sigma) are exemplary 

35 nutrient solutions. In addition, auiy of the media described in Ham and 
Wallace, Meth . Enz . . 58:44 (1979), Barnes and Sato, Anal . Biochem. . 102:255 
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(1980), U.S. Patent NOS. 4,767,704; 4,657,866; 4,927,762; or 4,560,655; WO 
90/03430; WO 87/00195; U.S. Patent Re. 30,985; or U.S. Patent No. 
5,122,469, the disclosures of all of which are incorporated herein by 
reference, may be used as culture media. Any of these media may be 
5 supplemented as necessary with hormones and/ or other growth factors (such 
as insulin, transferrin, or epidermal growth factor) , salts (such as sodium 
chloride, calcium, magnesium, and phosphate), buffers (such as HEPES) , 
nucleosides (such as adenosine and thymidine), antibiotics (such as 
GentamycixP* drug) , trace elements (defined as inorganic compounds usually 

10 present at final concentrations in the micromolar range) , and glucose or 
an equivalent energy source. Any other necessary supplements may also be 
included at appropriate concentrations that would be )cnown to those skilled 
in the art. The preferred nutrient solution comprises fetal bovine serum. 
The term "selection agent" refers to a substance that interferes with 

15 the gro%#th or survival of a host cell that is deficient in a particular 
selectable gene. Examples of selection agents are presented in Table 1 
above. The selection agent preferably comprises an ** amplifying agent" which 
is defined for purposes herein as an agent for amplifying copies of the 
amplifiaUale gene, such as Mtx if the au^lifiable gene is DKFR. See Table 

20 1 for examples of amplifying agents. 

As used herein, the term ^'transcriptional initiation site" refers to 
the nucleic acid in the DNA construct corresponding to the first nucleic 
acid incorporated into the primary transcript, i.e., the mRNA precursor, 
which site is generally provided at, or adjacent to, the 5' end of the DNA 

25 construct. 

The term "transcriptional termination site" refers to a sequence of 
DNA, normally represented at the 3' end of the DNA construct, that causes 
RNA polymerase to terminate transcription. 

As used herein, "transcriptional regulatory region" refers to a 

30 region of the DNA construct that regulates transcription of the selectable 
gene and the product gene. The transcriptional regulatory region normally 
refers to a promoter sequence (i.e. a region of DNA involved in binding of 
RNA polymerase to initiate transcription) which can be constitutive or 
inducible and, optionally, an enhancer (i.e. a cis-acting DNA element, 

35 usually from about 10-300 bp, that acts on a promoter to increase its 
transcription) . 

As used herein, "product gene" refers to DNA that encodes a desired 
protein or polypeptide product. Any product gene that is capable of 
expression in a host cell may be used, although the methods of the 

40 invention are particularly suited for obtaining high-level expression of 
a product gene that is not also a selectable or amplifiable gene. 
Accordingly, the protein or polypeptide encoded by a product gene typically 
will be one that is not necessary for the growth or survival of a host cell 
under the particular cell culture conditions chosen. For example, product 

45 genes suitably encode a peptide, or may encode a polypeptide sequence of 
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amino acids for which the chain length is sufficient to produce higher 
levels of tertiary and/or quaternary structure . 

Examples of bacterial polypeptides or proteins include, e.g., 
alkaline phosphatase and i?- lactamase . Examples of mammalian polypeptides 
5 or proteins include molecules such as renin; a growth hormone, including 
human growth hormone, and bovine growth hormone; growth hormone releasing 
factor; parathyroid hormone; thyroid stimulating hormone; lipoproteins; 
alpha- 1 -antitrypsin; insulin A- chain; insulin B- chain; proinsulin; follicle 
stimulating hormone; calcitonin; luteinizing hormone; glucagon; clotting 

10 factors such as factor VIIIC, factor IX, tissue factor, and von Willebrands 
factor; anti-clotting factors such as Protein C; atrial natriuretic factor; 
lung surfactant; a plasminogen activator, such as urokinase or human urine 
or tissue-type plasminogen activator (t-PA) ; bombesin; thrombin; 
hemopoietic growth factor; tumor necrosis factor-alpha and -beta; 

15 enkephalinase; RANTES (regulated on activation normally T-cell expressed 
and secreted) ; human macrophage inflammatory protein (MlP-l-alpha) ; a serum 
albumin such as human serum albumin; mullerian- inhibiting siibstance; 
relaxin A-chain; relaxin B-chain; prorelaxin; mouse gonadotropin- associated 
peptide; a microbial protein, such as beta- lactamase; DNase; inhibin; 

20 activin; vascular endothelial growth factor (VEGF) ; receptors for hormones 
or growth factors; integrin; protein A or D; rheumatoid factors; a 
neurotrophic factor such as bone-derived neurotrophic factor (BDNF) , 
neurotrophin-3, -4, -5, or -6 (NT-3, NT-4, NT-5, or NT-6) , or a nerve 
growth factor such as NGF-/5; platelet -derived growth factor (PDGF) ; 

25 fibroblast growth factor such as aFGF and bFGF; epidermal growth factor 
(EGF) ; transforming gro%^h factor (TGF) such as TGF-alpha and TGF-beta, 
including TGP-iSl, TGP-P2, TGF-P3, TGF-/J4, or TGF-/55; insulin-like growth 
factor-I and -II (IGF-I and IGF-II) ; des (1-3) rIGF-I (brain IGF-I) , insulin- 
like growth factor binding proteins; CD proteins such as CD- 3, CD-4, CD- 8, 

30 and CD- 19; erythropoietin; osteoinductive factors; immunotoxins ; a bone 
morphogenetic protein (BMP) ; an interferon such as interferon- alpha, -beta, 
auid -gamma; colony stimulating factors (CSFs) , e.g., M-CSF, GM-CSF, and G- 
CSF; interleiikins (ILs) , e.g., IL-1 to IL-10; superoxide dismutase; T-cell 
receptors; surface membrane proteins; decay accelerating factor; viral 

35 antigen such as, for example, a portion of the AIDS envelope; transport 
proteins; homing receptors; addressins; regulatory proteins; antibodies; 
chimeric proteins such as immunoadhesins and fragments of any of the above - 
listed polypeptides . 

The product gene preferably does not consist of an anti -sense 

40 sequence for inhibiting the expression of a gene present in the host. 
Preferred proteins herein are therapeutic proteins such as TGF-/3, TGF-a, 
PDGF, EGF, FGF, IGF-I, DNase, plasminogen activators such as t-PA, clotting 
factors such as tissue factor and factor VIII, hormones such as relaxin and 
insulin, cytokines such as IFN-7, chimeric proteins such as TNF receptor 

45 IgG immunoadhesin (TOFr-IgG) or antibodies such as anti-Ig£. 
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The term *'intron" as used herein refers to a nucleotide sequence 
present within the transcribed region o£ a gene or within a messenger RNA 
precursor, which nucleotide sequence is capable of being excised, or 
spliced, from the messenger RNA precursor by a host cell prior to 
5 translation. Introns suitable for use in the present invention are 
suitably prepared by any of several methods that are well knovm in the art, 
such as purification from a naturally occurring nucleic acid or de novo 
synthesis. The introns present in many naturally occurring eukaryotic 
genes have been identified and characterized. Mount, Nuc , Acids Res . , 

10 10:459 (1982) . Artificial introns con^rising functional splice sites also 
have been described. Winey et al., Mol. Cell Biol. , 9:329 (1989); 
Gatermann et al,, Mol . Cell Biol . . 9:1526 (1989) . Introns may be obtained 
from naturally occurring nucleic acids, for example, by digestion of a 
naturally occurring nucleic acid with a suitable restriction endonuclease , 

15 or by PGR cloning using primers complementary to sequences at the 5' and 
3' ends of the intron. Alternatively, introns of defined sequence and 
length may be prepared synthetically using various methods in organic 
chemistry. Narang et al., Meth. Enzvmol. . 68:90 (1979); Caruthers et al., 
Meth . Enzvmol . . 154:287 {1985); Froehler et al.. Nuc. Acids Res. . 14:5399 

20 (1986) . 

As used herein "splice donor site** or "SD** refers to the DNA sequence 
immediately surrounding the exon- intron boiindary at the 5' end of the 
intron,* where the **exon" comprises the nucleic acid 5' to the intron. Many 
splice donor sites have been characterized and Ohshima et al., J . Mol . 

25 Biol . . 195:247-259 (1987) provides a review of these. An "efficient splice 
donor sequence" refers to a nucleic acid sequence encoding a splice donor 
site wherein the efficiency of splicing of messenger RNA precursors having 
the splice donor sequence is between about 80 to 99% and preferably 90 to 
95% as determined by quantitative PCR. Examples of efficient splice donor 

30 sequences include the wild type (WT) ras splice donor sequence and the 
GACrGTAAGT sequence of Example 3. Other efficient splice donor sequences 
can be readily selected using the techniques for measuring the efficiency 
of splicing disclosed herein. 

The terms "PCR" and "polymerase chain reaction" as used herein refer 

35 to the in vitro an^lification method described in US Patent No. 4,683,195 
(issued July 28, 1987) . In general, the PCR method involves repeated 
cycles of primer extension synthesis, using two DNA primers capable of 
hybridizing preferentially to a template nucleic acid comprising the 
nucleotide seq[uence to be amplified. The PCR method can be used to clone 

40 specific DNA sequences from total genomic DNA, cDNA transcribed from 
cellular RNA, viral or plasmid DNAs. Wang & Mar}c, in PCR Protocols . pp.70- 
75 (Academic Press, 1990); Scharf, in PCR Protocols , pp. 84-98; Kawasaki 
& Nang, in PCR Technolocrv , pp. 89-97 (Stockton Press, 1989) . Reverse 
transcription-polymerase chain reaction (RT-PCR) can be used to analyze RNA 

45 samples containing mixtures of spliced and unspliced mRNA trouiscripts . 
Fluorescently tagged primers designed to span the introi^ are used to 
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amplify both spliced and unspliced targets. The resultant an^lif ication 
products are then separated by gel electrophoresis and quantitated by 
measuring the fluorescent emission of the appropriate band(s) . A 
comparison is made to determine the amount of spliced and unspliced 
5 transcripts present in the RNA sample . 

One preferred splice donor sequence is a **consensu8 splice donor 
sequence". The nucleotide sequences surrounding intron splice sites, which 
sequences are evolutionarily highly conserved, are referred to as 
"consensus splice donor sequences". In the mRNAs of higher eukaryotes, the 

10 5' splice site occurs within the consensus sequence AG:6UAA6U (wherein the 
colon denotes the site of cleavage and ligation) . In the mRKAs of yeast, 
the 5' splice site is bounded by the consensus sequence iGUAUGU. Padgett, 
et al., Ann. Rev. Biochem. . 55:1119 (1966). 

The expression "splice acceptor site" or "SA" refers to the sequence 

15 immediately surrounding the intron- exon botindary at the 3' end of the 
intron, where the "exon" comprises the nucleic acid 3' to the intron. Many 
splice acceptor sites have been characterized and Ohshima et ai . , J . Mol . 
Biol . , 195:247-259 (1987) provides a review of these. The preferred splice 
acceptor site is an efficient splice acceptor site which refers to a 

20 nucleic acid sequence encoding a splice acceptor site wherein the 
efficiency of splicing of messenger RNA precursors having the splice 
acceptor site is between about 80 to 99% and preferably 90 to 95% as 
determined by quantitative PGR. The splice acceptor site may con^rise a 
consensus sequence. In the mRNAs of higher eukaryotes, the 3' splice 

25 acceptor site occurs within the consensus sequence (U/C)iiNCAG:G. In the 
mRNAs of yeast, the 3' acceptor splice site is bounded by the consensus 
sequence (C/U)AG: . Padgett, et ai., supra. 

As used herein "culturing for sufficient time to allow amplification 
to occur" refers to the act of physically culturing the eukaryotic host 

30 cells which have been transformed with the DNA construct in cell culture 
media containing the an^lifying agent, until the copy number of the 
amplifiable gene (and preferably also the copy number of the product gene) 
in the host cells has increased relative to the transformed cells prior to 
this culturing. 

35 The term "expression" as used herein refers to transcription or 

trouislation occurring within a host cell . The level of expression of a 
product gene in a host cell may be determined on the basis of either the 
amount of corresponding mRNA that is present in the cell or the amount of 
the protein encoded by the product gene that is produced by the cell . For 

40 example, mRNA transcribed from a product gene is desirably quantitated by 
northern hybridization. Sambrook, et ai.. Molecular Cloning: A Laboratory 
Manual, pp. 7.3-7.57 (Cold Spring Harbor Laboratory Press, 1989) . Protein 
encoded by a product gene can be quantitated either by assaying for the 
biological activity of the protein or by employing assays that are 

45 independent of such activity, such as western blotting or radioimmunoassay 
using cUitibodies that are capable of reacting with the protein. Sambroolc, 
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et al.. Molecular Cloning: A Laboratory Manual , pp. 18. 1-18. B8 (Cold Spring 
Harbor LaJDoratory Press, 1989) . 

Modes for Carrying Out the Invention 

Methods and compositions are provided for enhancing the staJsility 
5 and/or copy number of a transcribed sequence in order to allow for elevated 
levels of a PNA sequence of interest. In general, the methods of the 
present invention involve trans feet ing a eukaryotic host cell with an 
expression vector conprising both a product gene encoding a desired 
polypeptide and a selectable gene (preferably an amplif iable gene) . 

10 Selectable genes and product genes may be obtained from genomic DNA, 

cDNA transcribed from cellular RNA, or by in vitro synthesis. For example, 
libraries are screened with probes (such as antibodies or oligonucleotides 
of about 20-80 bases) designed to identify the selectable gene or the 
product gene (or the protein (s) encoded thereby) . Screening the cDMA or 

15 genomic library with the selected probe may be conducted using standard 
procedures as described in chapters 10-12 of Sambrook et ai.« Molecular 
Cloning; A Laboratory Manual (New York: Cold Spring Harbor Laboratory 
Press, 1989} . An alternative means to isolate the selectable gene or 
product gene is to use PCR methodology as described in section 14 of 

20 Sambrook et ai., supra. 

A preferred method of practicing this invention is to use carefully 
selected oligonucleotide sequences to screen cDNA libraries from various 
tissues known to contain the selectsJ^le gene or product gene . The 
oligonucleotide sequences selected as probes should be of sufficient length 

25 and sufficiently unambiguous that false positives are minimized. 

The oligonucleotide generally is labeled such that it can be detected 
upon hybridization to DNA in the library being screened. The preferred 
method of labeling is to use ^^P- labeled ATP with polynucleotide kinase, 
as is well known in the art, to radiolabel the oligonucleotide. However, 

30 other methods may be used to label the oligonucleotide, including, but not 
limited to, biotinylation or enzyme labeling. 

Sometimes, the DNA encoding the selectable gene and product gene is 
preceded by DNA encoding a signal sequence having a specific cleavage site 
at the N- terminus of the mature protein or polypeptide. In general, the 

35 signal sequence may be a component of the expression vector, or it may be 
a part of the selectable gene or product gene that is inserted into the 
e3q)ression vector. If a heterologous signal sequence is used, it 
preferably is one that is recognized and processed (i.e., cleaved by a 
signal peptidase) by the host cell. For yeast secretion the native signal 

40 sequence may be substituted by, e.g., the yeast invertase leader, alpha 
factor leader (including Saccharomyces and iCIuyveramyces a- factor leaders, 
the latter described in U.S. Pat. No. 5,010,162 issued 23 ;^ril 1991), or 
acid phosphatase leader, the C. albicans glucoamylase leader (EP 362,179 
published 4 April 1990) , or the signal described in WO 90/13646 published 

45 15 November 1990. In mammalian cell expression the native signal sequence 
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of the protein of interest is satisfactory, although other manimalian signal 
sequences may be suitable, such as signal secjuences from secreted 
polypeptides of the same or related species, as well as viral secretory 
leaders, for example, the herpes simplex gD signal. The DNA for such 
5 precursor region is ligated in reading frame to the selectable gene or 
product gene. 

As shovm in Figure lA, the selectable gene is generally provided at 
the 5' end of the DNA construct and this selectable gene is followed by the 
product gene. Therefore, the full length (non- spiced) message will contain 

10 DHFR as the first open reading frame and will therefore generate DHFR 
protein to allow selection of stable transf ectants . The full length message 
is not expected to generate appreciable amounts of the protein of interest 
as the second AUG in a dicistronic message is an inefficient initiator of 
translation in mammalian cells (Kozak, J. Cell Biol. . 115: 887-903 [1991]). 

15 The selectable gene is positioned within an intron. Introns are 

noncoding nucleotide sequences, normally present within many eukaryotic 
genes, which are removed from newly transcribed mRKA precursors in a 
multiple*step process collectively referred to as splicing. 

A single mechanism is thought to be responsible for the splicing of 

20 mRNA precursors in mammaliam, plant, and yeast cells. In general, the 
process of splicing requires that the 5' and 3' ends of the intron be 
correctly cleaved and the resulting ends of the mRNA be accurately joined, 
such that a mature mRNA having the proper reading frame for protein 
synthesis is produced. Analysis of a variety of naturally occurring and 

25 synthetically constructed mutant genes has shown that nucleotide changes 
at many of the positions within the consensus sequences at the 5' and 3' 
splice sites have the effect of reducing or abolishing the synthesis of 
mature mRNA. Sharp, Science . 235:766 (1987); Padgett, et al., Ann . Rev . 
Biochem., 55:1119 (1986); Green, Ann. Rev. Genet. . 20:671 (1986). 

30 Mutational studies also have shown that RNA secondary structures involving 
splicing sites can affect the efficiency of splicing. Solnick, Cell . 
43:667 (1985); Konarska, et al . , Cell . 42:165 (1985). 

The length of the intron may also affect the efficiency of splicing. 
By making deletion mutations of different sizes within the large intron of 

35 the rabbit beta -globin gene, Wieringa, et al. determined that the minimum 
intron length necessary for correct splicing is about 69 nucleotides. 
Cell . 37:915 (1984) . Similar studies of the intron of the adenovirus ElA 
region have shown that an intron length of about 78 nucleotides allows 
correct splicing to occur, but at reduced efficiency. Increasing the 
. 40 length of the intron to 91 nucleotides restores normal splicing efficiency, 
whereas truncating the intron to 63 nucleotides abolishes correct splicing. 
Ulfendahl, et al., Nuo. Acids Res. . 13:6299 (1985). 

To be useful in the invention, the intron must have a length such 
that splicing of the intron from the mRNA is efficient. The preparation of 

45 introns of differing lengths is a routine matter, involving methods well 
known in the art, such as de novo synthesis or in vitro deletion 
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mutagenesis of an existing intron. Typically, the intron will have a length 
of at least about 150 nucleotides, since introns which are shorter than 
this tend to be spliced less efficiently. The upper limit for the length 
of the intron Ccm be up to 30 kB or more. However, as a general 
5 proposition, the intron is generally less than about 10 kB in length. 

The intron is modified to contain the selectable gene not normally 
present within the intron using any of the various known methods for 
modifying a nucleic acid in vitro. Typically, a selectable gene will be 
introduced into an intron by first cleaving the intron with a restriction 

10 endonuclease, and then covalently joining the resulting restriction 
fragments to the selectable gene in the correct orientation for host cell 
expression, for example by ligation with a DKA ligase enzyme. 

The DNA construct is dicistronic, i.e. the selectable gene and 
product gene are both under the transcriptional control of a single 

15 transcriptional regulatory region. As mentioned above, the transcriptional 
regulatory region comprises a promoter. Suitable promoting sequences for 
use with yeast hosts include the promoters for 3-phosphoglycerate kinase 
(Hitzeman et al., J. Biol. Chem. . 255:2073 [1980]) or other glycolytic 
enzymes (Hess et al-, J . Adv . Enzyme Reg ■ . 7:149 [1968]; and Holland, 

20 Biochemistry . 17:4900 [1978]), such as enolase, glyceraldehyde- 3 -phosphate 
dehydrogenase, hexokinase, pyruvate decarboxylase, phosphof rrictokinase, 
glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, 
triosephosphate isomerase, phosphoglucose isomerase, and glucokinase. 

Other yeast promoters, which are inducible promoters having the 

25 additional advantage of transcription controlled by growth conditions, are 
the promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid 
phosphatase, degradative enzymes associated with nitrogen metabolism, 
metallothionein, glyceraldehyde- 3 -phosphate dehydrogenase, euid enzymes 
responsible for maltose and galactose utilization. Suitable vectors and 

30 promoters for use in yeast expression are further described in Hitzeman et 
ai., EP 73,657A. Yeast enhancers also are advantageously used with yeast 
promoters . 

Expression control sequences are known for eukaryotes. Virtually all 
eukaryotic genes have an AT-rich region located approximately 25 to 30 

35 bases upstream from the site where transcription is initiated. Another 
sequence found 70 to 80 bases upstream from the start of transcription of 
msuiy genes is a CXCAAT region where X may be any nucleotide. 

Product gene transcription from vectors in mammalian host cells is 
controlled by promoters obtained from the genomes of viruses such as 

40 polyoma virus, fowlpox virus (UK 2,211,504 published 5 July 1989), 
adenovirus (such as Adenovirus 2), bovine papilloma virus, avian sarcoma 
virus, cytomegalovirus, a retrovirus, hepatitis -B virus and most preferably 
Simian Virus 40 (SV40) , from heterologous mammalian promoters, e.g. the 
actin promoter or an immunoglobulin promoter, from heat-shock promoters, 

45 and *from the promoter normally associated with the product gene, provided 
such promoters are compatible with the host cell systems . 
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The early and late promoters of the SV40 virus are conveniently 
obtained as an SV40 restriction fragment that also contains the SV40 viral 
origin of replication. Fiers et al . , Nature , 273:113 (1978); Mulligan and 
Berg, Science . 209:1422-1427 (1980); Pavlakis et al., Proc. Natl. Acad. 
5 Sci. USA . 78:7398-7402 (1981) . The immediate early promoter of the human 
cytomegalovirus (CMV) is conveniently obtained as a Kindlll E restriction 
fragment. Greenaway et al,, Gene . 18:355-360 (1982). A system for 
expressing DNA in mammalian hosts using the bovine papilloma virus as a 
vector is disclosed in U.S. 4,419,446. A modification of this system is 

10 described in U.S. 4,601,978. See also Gray et al.. Nature . 295:503-508 
(1982) on expressing cDNA encoding immune interferon in monkey cells; , 
Reyes et al.. Nature . 297:598-601 (1982) on expression of human /5- 
interferon cDNA in mouse cells under the control of a thymidine kinase 
promoter from herpes simplex virus, Canaani and Berg, Proc. Natl. Acad. 

15 Sci. USA . 79:5166-5170 (1982) on expression of the human interferon 01 gene 
in cultured mouse and rabbit cells, and Gorman et al., Proc . Natl . Acad . 
Sci, USA . 79:6777-6781 (1982) on expression of bacterial CAT sequences in 
CV-1 monkey kidney cells, chicken embryo fibroblasts, Chinese hamster ovary 
cells, HeLa cells, €uid mouse NIH-3T3 cells using the Rous sarcoma virus 

20 long terminal repeat as a promoter. 

Preferably the transcriptional regulatory region in higher eukaryotes 
con^rises an enhancer sequence. Enhancers are relatively orientation and 
position independent having been fotand 5' (Zjainins ec al., Proc. Natl. 
Acad. Sci. USA . 78:993 (1981]) and 3' (Lusky et al . , Mol . Cell Bio. . 3:1108 

25 [1963]) to the trcuiscription unit , within an intron (Banerji et ai., Cell . 
33:729 [1983]} as well as within the coding sequence itself (Osborne et 
al., Mol . Cell Bio . . 4:1293 [1984]) . Many enhauicer sequences are now known 
from mammalian genes (globin, elastase, albumin, a- fetoprotein and 
insulin) . Typically, however, one will use an enhancer from a eiikaryotic 

30 cell virus. Examples include the SV40 enhancer on the late side of the 
replication origin (bp 100-270), the cytomegalovirus early promoter 
enhancer (CMV) , the polyoma enhancer on the late side of the replication 
origin, jand adenovirus enhancers. See also Yaniv, Nature . 297:17-18 (1982) 
on enhancing elements for activation of etikaryotic promoters . The enhancer 

35 may be spliced into the vector at a position 5' or 3' to the product gene, 
hxxt is preferably located at a site 5' from the promoter. 

The DNA construct has a transcriptional initiation site following the 
transcriptional regulatory region and a transcriptional termination region 
following the product gene (see Figure lA) . These sequences are provided 

40 in the DNA construct using technic[ues which are well lcno%m in the art. 

The DNA construct normally forms part of an expression vector which 
may have other components such as an origin of replication (i.e., a nucleic 
acid sequence that enables the vector to replicate in one or more selected 
host cells) and, if desired, one or more additional selectable gene(s). 

45 Construction of suitable vectors containing the desired coding and control 
sequences employs standard ligation techniques. Isolated plasmids or DNA 
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fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. 

Generally, in cloning vectors the origin of replication is one that 
enables the vector to replicate independently of the host chromosomal DNA, 
5 and includes origins of replication or autonomously replicating sequences . 
Such sequences are well known. The 2^ plasmid origin of replication is 
suitable for yeast, and various viral origins (SV40, polyoma, adenovirus, 
VSV or BPV) are useful for cloning vectors in mammalian cells. Generally, 
the origin of replication component is not needed for mammalian expression 
10 vectors (the SV40 origin may typically be used only because it contains the 
early promoter) . 

Most expression vectors are "shuttle" vectors, i,e,, they are capable 
of replication in at least one class of organisms but can be transfected 
into another organism for expression. For example, a vector is cloned in 
15 E. coli and then the same vector is transfected into yeast or mammaliaui 
cells for expression even though it is not capable of replicating 
independently of the host cell chromosome. 

For analysis to confirm correct sequences in plasmids constructed, 
plasmids from the transformants are prepared, analyzed by restriction, 
20 and/or sequenced by the method of Messing et a J . , Nucleic Acids Res ♦ , 9:309 
(1981) or by the method of Maxam et al.. Methods in Enzvmolocrv . 65:499 
(1980) . 

The expression vector having the DNA construct prepared as discussed 

above is transformed into a eukaryotic host cell. Suitable host cells for 
25 cloning or expressing the vectors herein are yeast or higher eukaryote 
cells . 

Eukaryotic microbes such as filamentous fungi or yeast are suitable 
hosts for vectors containing the product gene. Saccharomyces cerevisiae, 
or common baker's yeast, is the most commonly used among lower eukaryotic 

30 host microorganisms. However, a niuiiber of other genera, species, and 
strains are commonly available and useful herein, such as 5. pombe [Beach 
and Nurse, Nature, 290:140 (1981)], KluyverontycGs lactis [Louvencourt et 
al., J. Bacteriol. . 737 (1983)], yarrowia (EP 402,226], Pichia pas tori s (EP 
183,070], Trichodezma reesla [EP 244,234], Neurospora craasa [Case et ai., 

35 Proc. Natl. Acad. Sci. USA . 76:5259-5263 (1979)], and Aspergillus hosts 
such as A, nidulans [Ballance et al . , Biochem. Biophvs . Res. Commun. . 
112:284-289 (1983); Tilbum fit ai . , Gene , 26:205-221 (1983); Yelton et al . , 
Proc. Natl. Acad. Sci. USA , 81:1470-1474 (1984)] and A. niger [Kelly and 
Hynes, EMBO J. . 4:475-479 (1985)]. 

40 Suitable host' cells for the expression of the product gene are 

derived from multicellular organisms. Such host cells are capable of 
complex processing and glycosylat ion activities . In principle, any higher 
eukaryotic cell culture is workable, whether from vertebrate or 
invertebrate culture. Examples of invertebrate cells include plant and 

45 insect cells. Numerous baculoviral strains and variants and corresponding 
permissive insect host cells from hosts such as Spodoptera frugiperda 
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(caterpillar) « Aedes aegypti (mosquito) , Aedes albopictus (mosquito) , 
DroBphila melanoffaater (fruitfly) , and Bombyx mori host cells have been 
identified. See, e.g., Luc)cow et ai., Bio/Technoloov . 6:47-55 (1988); 
Miller et al . , in Genetic Engineering . Setlow, J.K. et al., eds.. Vol. 8 
5 (Plenum Publishing, 1966), pp. 277-279; and Maeda et al., Nature, 315:592- 
594 (1985) . A variety of such viral strains are publicly available, e.g. , 
the L-l variant of Autographa califomicaHPV and the Bm-5 strain of Bombyx 
mori NPV, and such viruses may be used as the virus herein according to the 
present invention, particularly for transfection of Spodoptera fruglperda 
10 cells. 

Plant cell cultures of cotton, com^ potato, soybean, petunia, 
tomato, and tobacco can be utilized as hosts. Typically, plamt cells are 
transfected by incubation with certain strains of the bacterium 
Agrobacceriuin tvmefaciens, which has been previously manipulated to contain 

15 the product gene. During incubation of the plant cell culture with A. 
tumefaciens , the product gene is transferred to the plant cell host such 
that it is transfected, and will, under appropriate conditions, express the 
product gene. In addition, regulatory and signal sequences compatible with 
plant cells are available, such as the nopaline synthase promoter and 

20 polyadenylation signal sequences. Depic)cer et al., J . Mol . Appl . Gen . , 
1:561 (1982) . In addition, DNA segments isolated from the upstream region 
of the T-DNA 780 gene are capable of activating or increasing trcuiscription 
levels of plant-expressible genes in recombinant DNA- containing plant 
tissue. EP 321,196 published 21 June 1969. 

25 However, interest has been greatest in vertebrate cells, and 

propagation of vertebrate cells in culture (tissue culture) has become a 
routine procedure in recent years [Tissue Culture , Academic Press, Kruse 
and Patterson, editors (1973)]. Examples of useful mammalian host cell 
lines are monJcey Icidney CVl line transformed by SV40 (COS- 7, ATCC CRL 

30 1651) ; human embryonic kidney line (293 or 293 cells sisbcloned for growth 
in suspension culture, Graham et al . , J. Gen Virol . , 36:59 [1977]); baby 
hamster Icidney cells (BHK, ATCC CCL 10) ; Chinese hamster ovary cells/-DHFR 
(CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci. USA . 77:4216 [1980]); 
dpl2.CHO cells {£P 307,247 published 15 March 1989); mouse Sertoli cells 

35 (TM4, Mather, Biol. Reorod. . 23:243-251 [1980]); monlcey Jcidney cells (CVl 
ATCC CCL 70) ; African green monlcey kidney cells (VERO-76, ATCC CRL-1587) ; 
human cervical carcinoma cells (HELA, ATCC CCL 2); canine kidney cells 
(MDCK. ATCC CCL 34) ; buffalo rat liver cells (BRL 3A, ATCC CRL 1442) ; human 
lung cells (W138, ATCC CCL 75); human liver cells (Hep G2, HB 6065); mouse 

40 mammary tumor (MMT 060562, ATCC CCL51) ; TRI cells (Mather et al.. Annals 
N.Y. Acad. Sci. . 383:44-68 [1982]); MRC 5 cells; PS4 cells; and a human 
hepatoma line (Hep G2) . 

Host cells are transformed with the above -described expression or 
cloning vectors of this invention and cultured in conventional nutrient 

45 media modified as appropriate for inducing promoters, selecting 
transformants, or amplifying the genes encoding the desired sequences. 
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Infection with Agrobacterivm twnefaciens is used for transformation 
of certain plant cells, as described by Shaw eC al.« Gene . 23:315 (1983) 
and WO 89/05859 published 29 June 1989. For manrnialioui cells without such 
cell walls, the calcium phosphate precipitation method of Graham and van 
5 der Eb, Virology , 52:456-457 (1978) may be used. General aspects of 
mammalian cell host system transformations have been described by Axel in 
U.S. 4,399,216 issued 16 August 1983. Transformations into yeast are 
typically carried out according to the method of Van Solingen et ai . , J. 
Bact . . 130:946 (1977) and Hsiao et al., Proc . Natl . Acad . Sci . (USA) , 

10 76:3829 (1979) . However, other methods for introducing DNA into cells such 
as by nuclear injection or by protoplast fusion may also be used. 

In the preferred embodiment the DNA is introduced into the host cells 
using electroporation. See Andreason, J. Tiss. Cult. Meth. . 15:56-62 
(1993) , for a review of electroporation techniques useful for practicing 

IS the instantly claimed invention. It %iras discovered that electroporation 
techniques for introducing the DNA construct into the host cells were 
preferable over calcium phosphate precipitation technic[ues insofar as the 
latter could cause the DNA to break up and forming concantemers . 

The mammalian host cells used to express the product gene herein 

20 may be cultured in a variety of media as discussed in the definitions 
section above. The media contains the selection agent used for selecting 
transformed host cells which have taken up the DNA construct (either as an 
intra- or extra -chromosomal element) . To achieve selection of the 
transfonned eukaryotic cells, the host cells may be grown in cell culture 

25 plates and individual colonies expressing the selectable gene (and thus the 
product gene) can be isolated and grown in growth medium until the 
nutrients are depleted. The host cells are then aoialyzed for transcription 
and/or transformation as discussed below. The culture conditions, such as 
temperature, pH, and the like, are those previously used with the host cell 

30 selected for expression, and will be apparent to the ordinarily skilled 
artisan. 

Gene amplification and/or expression may be measured in a sample 
directly, for example, by conventional Southern blotting, Northern blotting 
to quant itate the transcription of mRNA (Thomas, Proc. Natl. Acad. Sci. 

35 USA, 77:5201-5205 [1980]), dot blotting (DNA analysis), or in situ 
hybridization, using an appropriately labeled probe, based on the sequences 
provided herein. Various labels may be employed, most commonly 
radioisotopes, particularly "P. However, other techniques may also be 
employed, such as using biotin-modif ied nucleotides for introduction into 

40 a polynucleotide. The biotin then serves as the site for binding to avidin 
or antibodies, which may be labeled with a wide variety of labels, such as 
radionuclides, fluorescens, enzymes, or the like. Alternatively, 
antibodies may be employed that can recognize specific duplexes, including 
DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes or DNA- protein 

45 duplexes. The antibodies in turn may be labeled and the assay may be 
carried out where the duplex is bound to a surface, so that upon the 
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formation of duplex on the surface, the presence of antibody bound to the 
duplex can be detected. 

Gene expression, alternatively, may be measured by immxinological 
methods, such as immunohistochemical staining of tissue sections cmd assay 
5 of cell culture or body fluids, to quantitate directly the expression of 
gene product. With immunohistochemical staining techniques, a cell san^le 
is prepared, typically by dehydration and fixation, followed by reaction 
with laQaeled antibodies specific for the gene product coupled, where the 
labels are usually visually detectable, such as enzymatic labels, 

10 fluorescent labels, luminescent labels, and the like. A particularly 
sensitive staining technique suitable for use in the present invention is 
described by Hsu et ai.. Am. J. Clin. Path. . 75:734-738 (1980) . 

In the preferred embodiment, the mRMA is analyzed by quantitative PGR 
(to determine the efficiency of splicing) and protein expression is 

15 measured using ELXSA as described in Exan^le l herein. 

The product of interest preferably is recovered from the culture 
medium as a secreted polypeptide^ although it also may be recovered from 
host cell lysates when directly expressed without a secretory signal . When 
the product gene is expressed in a recombinant cell other than one of human 

20 origin, the product of interest is con^letely free of proteins or 
polypeptides of humam origin. However, it is necessary to purify the 
product of interest from recombinant cell proteins or polypeptides to 
obtain preparations that are substantially homogeneous as to the product 
of interest. As a first step, the culture medium or lysate is centrifuged 

25 to remove particulate cell debris. The product of interest thereafter is 
purified from contaminant soluble proteins and polypeptides, for example, 
by fractionation on immunoaf finity or ion-exchange columns; ethanol 
precipitation; reverse phase HPLC; chromatography on silica or on a cation 
exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate 

30 precipitation; gel electrophoresis using, for example, Sephadex 6-75; 
chromatography on plasminogen columns to bind the product of interest and 
protein A Sepharose columns to remove contaminants such as IgG. 

The following examples are offered by way of illustration only and 
are not intended to limit the invention in any manner. All patent and 

35 literature references cited herein are expressly incorporated by reference. 

EXAMPLE 1 

tPA production using the dicistronic expression vectors 
It was sought to increase the level of homogeneity with regard to 
expression levels of stable clones by expressing a selectable marker (such 
40 as DHFR) and the protein of interest from a single promoter. These vectors 
divert most of the transcript to product expression while linking it at a 
fixed ratio to DHFR expression via differential splicing. 

Vectors were constructed which were derived from the vector pRK (Suva 
et ai.. Science > 237:893-896 [1987]) which contains an intron between the 
45 cytomegalovirus immediate early promoter (CMV) and the cDNA that encodes 
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the polypeptide of interest. The intron o£ pRK is 139 nucleotides in 
length, has a splice donor site derived from cytomegalovirus immediate 
early gene (CMVIE) , and a splice acceptor site from an IgG heavy chain 
variable region (Vh) gene (Eaton et ai., Biochem. . 25:8343 [1986]). 
S DHFR/intron vectors were constructed by inserting an EcoRV linker 

into the BSTXl site present in the intron of pRK7. An 830 base-pair 
fragment containing a mouse DHFR coding fragment was inserted to obtain 
DHFR intron expression vectors which differ only in the sequence that 
comprises the splice donor site. Those sequences were altered by 

10 overlapping PGR mutagenesis to obtain sequences that match splice donor 
sites found between exons 3 and 4 of normal and mutant Ras genes. PGR was 
also used to destroy the splice donor site. 

A mouse DHFR cDNA fragment (Simonsen et al., Proc. Natl. Acad. Sci. 
USA - 80:2495*2499 [1983]) vfas inserted into the intron of this vector 59 

15 nucleotides downstream of the splice donor site. The splice donor site of 
this vector was altered by mutagenesis to change the ratio of spliced to 
non-spliced message in transfected cells. It has previously been shown 
that a single nucleotide change (G to A) converted a relatively efficient 
splice donor site found in the normal ras gene into an inefficient splice 

20 site (Cohen et al.. Nature . 334:119-124 [1988]). This effect has been 
demonstrated in the context of the ras gene and confirmed when these 
sequences were transferred to human growth hormone constructs (Cohen et 
al.. Cell . 58:461-472 [1989]). Additionally, a non functional 5' splice 
site (GT to CA) was constructed as a control (aGT) . A polylinker was 

25 inserted 35 nucleotides downstream of the 3' splice site to accept the cDNA 
of interest. A vector containing tPA (Pennica et aJ., Nature . 301:214-221 
[1983]) was linearized downstream of the polyadenylation site before it was 
introduced into CHO cells (Potter et al . , Proc. Natl. Acad. Sci. USA . 
81:7161 [1984] ) . 

30 Plasmid DNA's that contained DHFR/intron, tPA and (a) wild type ras 

(WT ras), i.e. Figure 3 (SEQ ID NO: 1), (b) mutant ras, or (c) non- 
functional splice donor site (aGT) were introduced into CHO DHFR minus 
cells by electroporation. The intron vectors were each linearized 
downstream of the polyadenylation site by restriction endonuclease 

35 treatment. The control vector was linearized downstream of the second 
polyadenylation site. The DNA's were ethanol precipitated after 
phenol /chloroform extraction and were resuspended in 20^1 l/io Tris EDTA. 
Then, lO^g of DNA was incubated with 10' CH0.dpl2 cells (EP 307,247 
published 15 March 1989) in 1 ml of PBS on ice for 10 min. before 

40 electroporation at 400 volts and 330;if using a BRL Cell Porator. 

Cells were returned to ice for 10 min. before being plated into non- 
selective medium. After 24 hours cells were fed nucleoside-free medium to 
select for stable DHFR-k clones which were pooled. The pooled DHFR-*- clones 
were lysed and mRNA's were prepared. 

45 To prepare the mRNA, RNA was extracted from 5 x 10' cells which were 

grown from pools of more than 200 clones derived from the stable 
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transfection of the three vectors, the essential construction of which is 
shovm in Figure IB and from non* trans f ec t ed CHO cells. RNA was purified 
over oligo-lDT cellulase (Collaborative Biomedical Products) , lO^g of mRNA 
was then siibjected to Northern blotting which involved running the mRNA on 
5 a 1.2% agarose, 6.6% formaldehyde gel, and transferring it to a nylon 
filter (Stratagene Duralon-UV membrane) , prehybridized, probed and washed 
according to the manufacturer's instructions. 

The filter was probed sequentially using probes (8ho%m in Figure IB) 
that %irould detect (a) the full length message, (b) both full length and 

10 spliced message, or (c) beta actin. Probing with the long probe showed 
that the vector that contains the efficient splice donor site (i.e. WT ras) 
generates predominately a mRKA of the size predicted for the spliced 
product while the other two vectors gave rise primarily to a mRNA that 
corresponds in size to non- spliced message. The DHFR probe detected only 

15 full length message and demonstrated that the WT ras splice donor derived 
vector generates very little full length message with which to confer a 
DHFR positive phenotype. 

Figure 4 shows the number of DHFR positive colonies obtained after 
duplicate electroporations with the three intron vectors described above 

20 cuid from a conventional vector that has a CMV promoter driving tPA and a 
SV40 promoter driving DHFR (see Figure 2) . The increase in colony number 
parallels the increase in full length message that accumulates with the 
modification of the splice donor sites. The conventional vector 
efficiently generates colonies and does not vary significantly from the aGT 

25 construct. 

The level of tPA expression was determined by seeding cells in l ml 
of F12:DMEM (50:50, with 5% FBS) in 24 well dishes to near confluency. 
Growth of the cells continued until the media was exhausted. Media was 
then assayed by ELISA for tPA production. Briefly, anti-tPA antibody was 
30 coated onto the wells of an ELISA microtiter plate, media san^les were 
added to the wells followed by washing. Binding of the antigen (tPA) was 
then quantified using horse radish peroxidase (HRPO) labelled anti-tPA 
antibody . 

Figure 5A depicts the titers of secreted tPA protein after pooling 
35 the clones of each group shown in Figure 4. While the number of colonies 
increased with a weakening of splice donor function, the inverse was seen 
with respect to tPA expression. The expression levels are consistent with 
the RKA products that are observed; as more of the dicistronic message is 
spliced an increased amount of message will contain tPA as the first open 
40 reading frame resulting in increased tPA expression. A mutation of GT to 
CA in the splice donor site results in an abundance of DHFR positive 
colonies which express undetectable levels of tPA, possibly resulting from 
inefficient utilization of the second AUG. Importantly, Figure 5A also 
shows that expression levels obtained from one of the dicistronic vectors 
45 (with WT ras SD) was about threefold higher than that obtained with the 
control vector containing a CMV promoter /enhancer driving tPA, SV40 
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promoter /exihancer controlling DHFR and SV40 polyadenylation signals 
controlling the expression of tPA and DHFR. 

Additionally, the homogeneity of expression in the pools was 
investigated. Figure 5B shows that all 20 clones generated by the WT ras 
5 splice donor site derived dicistronic vectors express detectable levels of 
tPA while only 4 of 20 clones generated by the control vector express tPA. 
None of the clones transfected with the non-splicing (i^GT) vector expressed 
tPA levels detectable by ELISA. This finding is consistent with previous 
observations that relatively few clones generated by conventional vectors 
10 make useful levels of protein. 

Expression of tPA was increased following methotrexate an^lif ication 
of pools. Figure 5C shows that 2 of the dicistronic vector derived pools 
(i.e. with WT ras and MUTANT ras SD sites) increased in expression markedly 
(8.4 and 7.7 f old) « vihile the pool generated by the conventional vector 
15 increased only slightly (2.8 fold) when each was subjected to 200 nM Mtx. 
An overall increase of 9 fold was obtained using the best dicistronic (WT 
ras SD) versus the conventional vector following amplification. Growth of 
the highest expressing amplified pool in nutrient rich production medium 
yielded titers of 4.2 /xg/ml tPA. 
20 It was shown that manipulation of the splice donor sequence alters 

the ratio of spliced to full length message and the number of colonies that 
form in selective medium. It was also shown that dicistronic expression 
vectors generate clones that express high levels of recombinant proteins . 
Surprisingly, it was possible to isolate high expressors which had the 
25 efficient WT ras splice donor site by selection for DHFR* cells despite the 
efficiency with which the DHFR gene was spliced from the RHA precursors 
formed in these cells. 

EXAhfP^j: ? 

TNFr-IqG production using the dicistronic expression vectors 
To prove the general api>licability of this approach^ a second product 
was evaluated in the dicistronic vector system containing, as the DNA of 
interest, an immunoadhesin (TNFr-IgG) capable of binding tumor necrosis 
factor (TNF) (Ashkenazi et al., Proc. Natl. Acad. Sci. USA . 88:10535-10539 
[1991] ) . The experiments described in Example l above were essentially 
repeated except that the product gene encoded the immunoadhesin TNFr-IgG. 
Plasmid DNA's that contained a TNFr-IgG cDNA and (a) WT ras, i.e. Figure 
6 (SEQ ID NO: 2) , (b) mutant ras or (c) nonfunctional splice donor site 
(AGT) were introduced into the dpl2.CH0 cells as discussed for Exan^le l. 
See Figure IC for an illustration of the DNA constructs . 

It was discovered that the number of DHFR positive colonies generated 
by three of these vectors was similar to that seen with the tPA constructs. 
Expression of TNFr*IgG also paralleled that seen with the tPA constructs 
(Figure 7A) . Amplification of pools from two of the constructs showed a 
marked increase in expression of immunoadhesin (9.6 and 6.6 fold) (Figure 



35 
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7B) . The best of these amplified pools expressed 9.5 /ig/ml when grown in 
nutrient rich production mediiim. 

Thus, it was again shown that dicistronic expression vectors generate 
clones that express high levels of recombinant proteins. Furthermore, 
5 contrary to expectations, it was discovered that isolation of high product 
expressing host DHFR"^ cells was possible using an efficient splice donor 
site (i.e. the \rr ras splice donor site) . 

EXAMPLE 3 

Antibody production using a dicistronic expression vector 

10 The usefulness of this system for antibody expression was evaluated 

by testing production of an antibody directed against IgE (Presta et al.. 
Journal of Immunology . 151:2623-2632 [1993]). Further, the flexibility of 
the system with regard to transcription initiation was tested by replacing 
the CMV promoter/enhancer present in the previous vectors with the 

15 promoter/ enhancer derived from the early region of SV40 virus (Griffin, 
B., Structure and Genomic Organization of SV40 and Polyoma Virus, In J. 
Tooze [Ed] DNA Tumor Viruses, Cold Spring Harbor Laboratory, Cold SprdLng 
Harbor, New York) . The heavy chain of the antibody was inserted downstream 
of DHFR as described in the earlier tPA and TNFr-IgG constructs. 

20 Additionally, a new splice donor site sequence (GAC:GTAAGT) was engineered 
into the vector which matches the consensus splice donor -site more closely 
than did the splice donor sites present in the vectors tested in Examples 
1 and 2 . The resultant expression vector is shown in Figures ID and 9 . 

It was discovered that this vector produced fewer colonies than the 

25 vectors previously tested, and produced predominantly a spliced RMA 
product. A second vector was constructed to have the light chain of the 
antibody \mder control of the SV4 0 promoter/ enhancer and poly-A and the 
hygromycin B resistance gene under control of the CMV promoter/enhancer and 
SV40 poly-A. These vectors %fere linearized at unique Hpal sites downstream 

30 of the poly-A signal, mixed at a ratio of light chain vector to heavy chain 
vector of 10:3 and electroporated into CHO cells iising an optimized 
protocol (as discussed in Examples 1 and 2) . 

Figure 11 shows the levels of antibody expressed by clones and pools 
after selection in hygromycin B followed by selection for DHFR expression. 

35 All 20 of the clones analyzed expressed high levels of antibody when grown 
in rich medium and varied from one another by only a factor of four. A 
pool of antibody producing clones was generated and assayed shortly after 
it was established. That pool was grown continuously for 6 weeks without 
a significant decrease in productivity demonstrating that its stability was 

40 sufficient to generate gram quantities of protein from its large scale 
culture . 

The pool was subjected to methotrexate amplification at 200nM and l^M 
and achieved a greater than 2 fold increase in antibody titer. The l^M Mtx 
resistant pool achieved a titer of 41 mg/L when grown under optimal 
45 conditions in suspension culture. 
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The structure of the expressed antibody was examined. Proteins 
expressed by the 200nM methotrexate resistant pool and by a well 
characterized expression clone' generated by conventional vectors (Presta 
et ai. [1993], supra) were metabolically labeled with S^^ cysteine and 
5 methionine. In particular « confluent 35mm plates of cells were 
metabolically labeled with 50/iCi each S-35 methionine and S-35 cysteine 
(Amersham) in serum free cysteine and methionine free F12:DMEM. After one 
hour, nutrient rich production media was added and labeled proteins were 
allowed to "chase" into the medium for six more hours. Proteins were run 

10 on a 12% SDS/PAGE gel (NOVEX) non-reduced or following reduction with B- 
mercaptoethanol . Dried gels were exposed to film for 16 hours . CHO 
control cells were also labeled. 

The majority of the antibody protein is secreted with a molecular 
weight of about 155 kilodaltons, consistent with a properly disulfide- 

15 linked antibody molecule with 2 light and 2 heavy chains. Upon reduction 
the molecular weight shifts to 2 approximately equally abundant proteins 
of 22.5 and 55 kilodaltons. The protein generated from the pool is 
indistinguishable from the antibody produced by the well characterized 
expression clone, with no apparent increase of free heavy or light chain 

20 expressed by the pool. 

CONCLUSION 

The efficient expression system described herein utilizes vectors 
consisting of promoter/enhancer elements followed by an intron containing 
the selectable marker coding sequence, followed by the cDNA of interest and 
a polyadenylation signal. 

Several splice donor site sequences were tested for their effect on 
colony number and expression of the cDNA of interest. A non- functional 
splice donor site, splice donor sites found in an intron between exons 3 
and 4 of mutant (mutant ras) and normal (WT ras) forms of the Harvey Ras 
gene and another efficient SD site (see Example 3) we;:e used. The vectors 
were designed to direct expression of dicistronic primary transcripts . 
Within a transf ected cell some of the transcripts remain full length while 
the remainder are spliced to excise the DHFR coding sequence. When the 
splice donor site is weakened or destroyed an increase in colony number 
is observed. 

Expression levels show the inverse pattern, with the most efficient 
splice donor sites generating the highest levels of tPA, TNFr immunoadhesin 
or anti*lgE V^. 

The homogeneity of expression of clones generated by the ras splice 
donor site intron DHFR vectors was compared to clones generated from a 
conventional vector with a separate promoter/enhancer and polyadenylation 
signal for each DHFR and tPA. The DHFR intron vector gives rise to 
colonies that are much more homogeneous with regard to expression than 
those generated by the conventional vector. Non- expressing clones derived 
from the conventional vector may be the result of breaks in the tPA or 
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TNFr-IgG domain of the plasmid during integration into the genome or the 
result of methylation of promoter elements (Busslinger et al., Cell . 
34:197-206 [1983]; Watt et al . , Genes and Development . 2:1136-1143 [1988]) 
driving tPA or TMFr-IgG expression. Promoter silencing by methylation or 
5 breaks in the DHFR-intron vectors would very likely render them incapable 
of conferring a DHFR positive phenotype. 

It was found that pools generated by the DHFR-intron vectors could 
be an^lif ied in methotrexate and would increase in expression by a factor 
of 8.4 (tPA) , or 9.8 (TNFr-IgG) . Pools from conventional vectors increased 

10 by only 2.B and 3.0 fold for tPA and TNFr-IgG when an5>lified similarly. 
Amplified pools resulted in 9 fold higher tPA levels and 15 fold higher 
TOFr-IgG levels when con^ared to the conventional vector amplified pools. 
Without being limited to any theory, the increase in expression of 
methotrexate resistant pools derived from the dicistronic vectors is likely 

15 due to the transcriptional linkage of DHFR and the product; when cells are 
selected for increased DHFR expression they consistently over- express 
product. Conventional approaches lack selectable marker and cDNA 
expression linkage and therefore methotrexate anqpl if icat ion of ten generates 
DHFR overexpression without the concomitamt increase in product expression. 

20 A further increase of 4 and 6.3 fold in expression were obtained when 

amplified tPA and TKFr-IgG pools were transferred from the media used for 
the selections and amplifications to a nutrient rich production mediiun. 

In Example 3, the expression vector had a splice donor site that more 
closely matches the consensus splice donor sequence aoid had the heavy chain 

25 of a humanized anti-IgE antibody inserted do%mstream. This vector was 
linearized and co- elect ropora ted with a second linearized vector that 
expresses the hygromycin resistance gene and the light chain of the 
antibody each under the control of its own promoter /enhancer and poly-A 
signals. An excess of light chain expression vector over the heavy chain 

30 dicistronic expression vector was used to bias in favor of light chain 
expression. Clones and a pool were generated after hygromycin B and DHFR 
selections. The clones were found to express relatively consistent, high 
levels of antibody, as did the pool. The ifiM pool achieved a titer of 
4lmg/L when grown vinder optimal conditions in suspension culture. 

35 The anti-IgE antibody was assessed by metabolic labeling followed by 

SDS/PAGE under reducing and non reducing conditions and found to be 
indistinguishable from the protein expressed by a highly characterized 
clonal cell line. Of particular importance is the finding that no free 
light chain is observed in the pool relative to the clone. 

40 A stable expression system for CHO cells has been developed that 

produces high levels of recombinant proteins rapidly and with less effort 
than that required by other expression systems. The vector system 
generates stable clones that express consistently high levels thereby 
reducing the number of clones that must be screened to obtain a highly 

45 productive clonal line. Alternatively, pools have been used to 
conveniently generate moderate to high levels of protein. This approach 
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may be particularly useful when a number of related proteins are to be 
expressed and con^^ared. 

Without being limited to this theory, it is possible the vectors that 
have very efficient splice donor sites generate very productive clones 
5 because so little transcript remains non spliced that only integration 
events that lead to the generation of high levels of RNA produce enough 
DHFR protein to give rise to colonies in selective medium. The high level 
of spliced message from such clones is then translated into abundant 
amounts of the protein of interest. Pools of clones made concurrently by 
10 introducing conventional vectors expressed lower levels of protein, and 
were unstable with regard to long term expression, and expression could not 
be appreciably increased when the cells were subjected to methotrexate 
amplification . 

The system developed herein is versatile in that it allows high 
15 levels of single and multiple subunit polypeptides to be rapidly generated 
from clones or pools of stable transf ectamts . This expression system 
combines the advantages of transient expression systems (rapid and labor 
non intensive generation of research amounts of protein) with the 
concurrent development of highly productive stable production cell lines. 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7360 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
(D> TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 



TACGGGGTCA TTAGTTCATA GCCGATATAT GGAGTTCCGC 6TTACATAAC 100 



TTACGGTAAA T<3GCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 



ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 



TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
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ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 



AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 

5 

TACTTGGCAG TACATCTACG TATTAGTCAT C6CTATTACC ATGGT6ATGC 400 
10 GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACX3GGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 

IS 

AAATCAACGG GACTTTCCAA AATGTCX3TAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 

20 

TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACX3CT GTTTTGACCT 650 



25 CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
TTGGAACGCG GATTCCCCGT GCCAAGAGTG CTGTAAGTAC CGCCTATAGA 750 

30 

GCGATAAGAG GATTTTATCC CCGCTGCCAT CATGGTTCGA CCATTGAACT 800 
GCATCXSTCGC CGTGTCCCAA AATATGGGGA TTGGCAAGAA CGGAGACCTA B50 

35 

CCCTGCCCTC CGCTCAGGAA CGCGTTCAAG TACTTCCAAA GAATGACCAC 900 



40 AACCTCTTCA GTGGAAGGTA AACAGAATCT GGT6ATTAT6 GGTAGGAAAA 950 
CCTGGTTCTC CATTCCTGAG AAGAATCGAC CTTTAAAGGA CAGAATTAAT 1000 

45 

ATAGTTCTCA GTAGAGAACT CAAAGAACCA CCACGAGGAG CTCATTTTCT 1050 
TGCCAAAAGT TTGGATGATG CCTTAAGACT TATTGAACAA CCGGAATTGG 1100 

50 

CAAGTAAAGT AGACATGGTT TGGATAGTCG GAGGCAGTTC TGTTTACCAG 1150 



55 GAAGCCATGA ATCAACCAGG CCACCTTAGA CTCTTTGTGA CAAGGATCAT 1200 
GCAGGAATTT GAAAGTGACA CGTTTTTCCC AGAAATTGAT TTGGGGAAAT 1250 

60 

ATAAACCTCT CCCAGAATAC CCAGGCGTCC TCTCTGAGGT CCAGGAGGAA 1300 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC GAGAAGAAAG ACTAACAGGA 1350 

65 

AGATGCTTTC AAGTTCTCTG CTCCCCTCCT AAAGCTATGC ATTTTTATAA 1400 
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GACCATGGGA CTTTTGCTGG CTTTAGACCC CCTTGGCTTC GTTAGAAC6C 1450 
GGCTACAATT AATACATAAC CTTATCTATC ATACACATAG ATTTAGGTGA 1500 
CACTATAGAA TAACATCCAC TTTGCCTTTC TCTCCACAGG TGTCACTCCA 1550 
GGTCAACTGC ACCTCGGTTC TAAGCTTGGG CTGCAGGTCX5 CCGTGAATTT 1600 
AAGGGACGCT GTGAAGCAAT CATGGATGCA ATGAAGAGAG GGCTCTGCTG 1650 
TGTGCTGCTG CTGTGTGGAG CAGTCTTCGT TTCX5CCCAGC CAGGAAATCC 1700 
ATGCCCGATT CAGAAGAGGA GCCAGATCTT ACCAAGTGAT CTGCAGAGAT 1750 
GAAAAAACGC AGATGATATA CCAGCAACAT CAGTCAT6GC TGCGCCCTGT 1800 
GCTCAGAAGC AACCGGGTGG AATATTGCT6 GTGCAACAGT GGCAGGGCAC 1850 
AGTGCCACTC AGTGCCTGTC AAAAGTTGCA GCGAGCCAAG GTGTTTCAAC 1900 
GGGGGCACCT GCCAGCAGGC CCTGTACTTC TCAGATTTCG TGTGCCAGTG 1950 
CCCCGAAGGA TTTGCTGGGA AGTGCTGTGA AATAGATACC AGGGCCACGT 2000 
GCTACGAGGA CCAGGGCATC AGCTACAGGG GCACGTGGAG CACAGCX3GAG 2050 
AGTGGCGCCG AGTGCACCAA CTG6AACAGC AGCGCGTTGG CCCAGAA6CC 2100 
CTACAGCGGG CGGAGGCCAG AC6CCATCAG GCTGGGCCTG GGGAACCACA 2150 
ACTACTGCAG AAACCCA6AT CGAGACTCAA AGCCCTGGTG CTACGTCTTT 2200 
AAGGCGGGGA AGTACAGCTC AGAGTTCTGC AGCACCCCTG CCTGCTCTGA 2250 
GGGAAACAGT GACT6CTACT TTGGGAATGG GTCAGCCTAC CGTGGCACGC 2300 
ACAGCCTCAC CGAGTCGGGT GCCTCCTGCC TCCCGTGGAA TTCCATGATC 2350 
CTGATAGGCA AGGTTTACAC AGCACAGAAC CCCAGT6CCC AGGCACTGGG 2400 
CCTGGGCAAA CATAATTACT GCCGGAATCC TGATGGGGAT GCCAAGCCCT 2450 
GGTGCCACGT GCTGAAGAAC CGCAGGCTGA CGTGGGAGTA CTGTGATGTG 2500 

CCCTCCTGCT CCACCTGCGG CCTGAGACAG TACAGCCAGC CTCAGTTTCG 2550 

I 
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CATCAAAGGA GGGCTCTTCG CCGJICATCGC CTCCCACCCC T66CAGGCTG 2600 
CCATCTTTGC CAAGCACAGG AGGTCGCCCG GAGAGCGGTT CCTGTGCGGG 2650 

5 

GGCATACTCA TCAGCTCCTG CTGGATTCTC TCTGCCGCCC ACTGCTTCCA 2700 
10 GGAGAGGTTT CCGCCCCACC ACCTGACGGT GATCTTGGGC AGAACATACC 2750 
GGGTGGTCCC TGGCXSAGGAG GAGCAGAAAT TTGAAGTCGA AAAATACATT 2800 

15 

GTCCATAAGG AATTCGATGA TGACACTTAC GACAATGACA TTGCGCTGCT 2850 
GCAGCTGAAA TCGGATTCGT CCCGCTGTGC CCAGGAGAGC AGCGTGGTCC 2900 

20 

GCACTGTGTG CCTTCCCCCG GCGGACCTGC AGCTGCCGGA CTGGACGGAG 2950 
25 TGTGAGCTCT CCGGCTACGG CAAGCATGAG GCCTTGTCTC CTTTCTATTC 3000 
GGAGC6GCTG AAGGAGGCTC ATGTCAGACT GTACCCATCX! AGCC6CTGCA 3050 

30 

CATCACAACA TTTACTTAAC AGAACAGTCA CCGACAACAT GCTGTGTGCT 3100 
GGAGACACTC GGAGCGGCGG GCCCCAGGCA AACTTGCACG ACGCCTGCCA 3150 

35 

GGGCGATTCG GGAGGCCCCC TGGTGTGTCT GAACGATGGC CGCATGACTT 3200 
40 TGGTGGGCAT CATCAGCTGG GGCCTGGGCT GTGGACAGAA GGATGTCCCG 3250 
GGTGTGTACA CCAAGGTTAC CAACTACCTA GACTGGATTC GTGACAACAT 3300 

45 

GCGACCGTGA CCAGGAACAC CCGACTCCTC AAAAGCAAAT GAGATCCCGC 3350 
CTCTTCTTCT TCAGAAGACA CTGCAAAGGC GCAGTGCTTC TCTACAGACT 3400 

50 

TCTCCAGACC CACCACACCG CAGAAGCGGG ACGAGACCCT ACAGGAGAGG 3450 
55 GAAGAGTGCA TTTTCCCAGA TACTTCCCAT TTTGGAAGTT TTCAGGACTT 3500 
GGTCTGATTT CAGGATACTC TGTCAGATGG GAAGACATGA ATGCACACTA 3550 

60 

GCCTCTCCAG GAATGCCTCC TCCCTGGGCA GAAGTGGGGG GAATTCAATC 3600 
GATGGCCGCC ATGGCCCAAC TTGTTTATTG CAGCTTATAA TGGTTACAAA 3650 

65 

TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT TTTCACTGCA 3700 
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TTCTAGTTGT OGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 3750 



TCGATCG6GA ATTAATTCG6 CGCAGCACCA TG6CCT6AAA TAACCTCTGA 3800 

5 

AAGA6GAACT TGGTTAGGTA CCTTCTGAGG CGGAAAGAAC CAGCTGTGGA 3850 
10 ATGTGTGTCA GTTAGGGTGT GGAAAGTCCC CAGGCTCCCC AGCAGGCAGA 3900 
A6TATGCAAA GCATGCATCT CAATTAGTCA GCAACCAGGT GTGGAAAGTC 3950 

15 

CCCAGGCTCC CCA6CAGGCA GAAGTATGCA AAGCATGCAT CTCAATTAGT 4000 
CAGCAACCAT AGTCCCX;CCC CTAACTCCGC CCATCCOGCC CCTAACTCCG 4050 

20 

CCCAGTTCCG CCCATTCTCC GCCCCATGGC TGACTAATTT TTTTTATTTA 4100 
25 TGCAGAGGCC GAGGCCGCCT CGGCCTCTGA GCTATTCCAG AAGTAGTGAG 4150 
GAGGCTTTTT TGGAGGCCTA GGCTTTTGCA AAAAGCTGTT AACAGCTTGG 4200 

30 

CACTGGCCGT CGTTTTACAA CGTCGTGACT GGGAAAACCC TGGCGTTACC 4250 
CAACTTAATC GCCTTGCAGC ACATCCCCCC TTCGCCAGCT GGCGTAATAG 4300 

35 

CGAAGAGGCC CGCACCGATC GCCCTTCCCA ACAGTTGCGT AGCCTGAATG 4350 
40 GCGAATGGCG CCTGATGCGG TATTTTCTCC TTACGCATCT GTGCGGTATT 4400 
TCACACCGCA TACGTCAAAG CAACCATAGT ACGCGCCCTG TAGCGGCGCA 4450 

45 

TTAAGCGCGG CGGGTGTGGT GGTTACXfCGC AGCGTGACCG CTACACTTGC 4500 
CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT CTTCCCTTCC TTTCTCGCCA 4550 

50 

CGTTCX3CCGG CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT CCCTTTAGGG 4600 
55 TTCCGATTTA GTGCTTTACG GCACCTCGAC CCCAAAAAAC TTGATTTGGG 4650 
TGATGGTTCA CGTAGTGGGC CATCGCCCT6 ATAGACGGTT TTTCGCCCTT 4700 

60 

TGACGTTGGA GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA 4750 
ACAACACTCA ACCCTATCTC GGGCTATTCT TTTGATTTAT AAGGGATTTT 4800 

65 

GCCGATTTCG GCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA 4850 
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ACGCGAATTT TAACAAAATA TTAACGTTTA CAATTTTAT6 GTGCACTCTC 4900 
AGTACAATCT GCTCT6AT6C CGCATAGTTA AGCCAACTCC 6CTATCGCTA 4950 

5 

CGTGACTGGG TCATGGCTGC GCCCCGACAC CCGCCAACAC CCGCTGAC6C 5000 
10 GCCCTGACGG GCTTGTCTGC TCCCGGCATC CGCTTACAGA CAAGCTGTGA 5050 
CCGTCTCCGG GAGCTGCATG TGTCAGAGGT TTTCACCGTC ATCACCGAAA 5100 

15 

CGCGCXSAGGC AGTATTCTTG AAGACGAAAG GGCCTCGTGA TACGCCTATT 5150 
TTTATAGGTT AATGTCATGA TAATAATGGT TTCTTAGACG TCAGGTGGCA 5200 

20 

CTTTTCGGGG AAATGTGCGC GGAACCCCTA TTTGTTTATT TTTCTAAATA 5250 
25 CATTCAAATA TGTATCCGCT CATGAGACAA TAACCCTGAT AAATGCTTCA 5300 

ATAATATTGA AAAAGGAAGA GTATGAGTAT TCAACATTTC CGTGTCGCCC 5350 

30 

TTATTCCCTT TTTTGCGGCA TTTTGCCTTC CTGTTTTTGC TCACCCAGAA 5400 

ACGCTGGTGA AAGTAAAAGA TGCTGAAGAT CAGTTGGGTG CACGAGTGGG 5450 

35 

TTACATCGAA CTGGATCTCA ACAGCGGTAA GATCCTTGAG AGTTTTCGCC 5500 
40 CCGAAGAACG TTTTCCAATG ATGAGCACTT TTAAAGTTCT GCTATGTGGC 5550 
GCGGTATTAT CCCGTGATGA CGCCGGGCAA GAGCAACTCG GTCGCCGCAT 5600 

45 

ACACTATTCT CAGAATGACT TGGTTGAGTA CTCACCAGTC ACAGAAAAGC 5650 
ATCTTACGGA TGGCATGACA GTAAGAGAAT TATGCAGTGC TGCCATAACC 5700 

50 

ATGAGTGATA ACACTGCGGC CAACTTACTT CTGACAACGA TCGGAGGACC 5750 
55 GAAGGAGCTA ACCGCTTTTT TGCACAACAT GGGGGATCAT GTAACTCGCC 5800 
TTGATCGTTG 6GAACCGGAG CTGAATGAAG CCATACCAAA CGACGAGCGT 5850 

60 

GACACCACGA TGCCAGCAGC AATGGCAACA ACGTTGCGCA AACTATTAAC 5900 
TGGCGAACTA CTTACTCTAG CTTCCCGGCA ACAATTAATA GACTGGATGG 5950 

65 

AGGCGGATAA AGTTGCAGGA CCACTTCTGC GCTCGGCCCT TCCGGCTGGC 6000 
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TGGTTTATTG CTGATAAATC TGGAGCCGGT GAGCGTGGGT CTCGCGGTAT 6050 
CATTGCAGCA CTGGGGCCAG ATGGTAAGCC CTCCCGTATC GTAGTTATCT 6100 
ACACGACGGG GAGTCAGGCA ACTATGGATG AACGAAATAG ACAGATCGCT 6150 
GAGATAGGTG CXTTCACTGAT TAAGCATTGG TAACTGTCAG ACCAAGTTTA 6200 
CTCATATATA CTTTAGATTG ATTTAAAACT TCATTTTTAA TTTAAAAGGA 6250 
TCTAGGTGAA GATCCTTTTT GATAATCTCA TGACCAAAAT CCCTTAACGT 6300 
GAGTTTTCGT TCCACTGAGC GTCAGACCCC GTAGAAAAGA TCAAAGGATC 6350 
TTCTTGAGAT CCTTTTTTTC TGCGCGTAAT CTGCTGCTTG CAAACAAAAA 6400 
AACCACCGCT ACCAGCGGTG GTTTGTTTGC CGGATCAAGA GCTACCAACT 6450 
CTTTTTCCGA AGGTAACTGG CTTCAGCAGA GC6CAGATAC CAAATACTGT 6500 
CCTTCTAGTC TAGCCGTAGT TAGGCCACCA CTTCAAGAAC TCTGTAGCAC 6550 
CGCCTACATA CCTCGCTCTG CTAATCCTGT TACCAGTGGC TGCTGCCAGT 6600 
GGCGATAAGT CGT6TCTTAC CXjGGTTGGAC TCAAGACGAT AGTTACCGGA 6650 
TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA CAGCCCAGCT 6700 
TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG TGAGCATTGA 6750 
GAAAGCGCCA CGCTTCCCGA AGGGAGAAAG GCGGACAGGT ATCCGGTAAG 6800 
CGGCAGGGTC GGAACAGGAG AGCGCACGAG GGAGCTTCCA GGGGGAAACG 6850 
CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG ACTTGAGCGT 6900 
CGATTTTTGT GATGCTCGTC AGGGGGGCGG AGCCTATGGA AAAACGCCAG 6950 
CAACGCGGCC TTTTTACXK3T TCCTGGCCTT TTGCTGGCCT TTTGCTCACA 7000 
TGTTCTTTCC TGCGTTATCC CCTGATTCTG TGGATAACCG TATTACCGCC 7050 
TTTGAGTGAG CTGATACCGC TCGCCGCAGC CGAACGACCG AGCGCAGCGA 7100 
GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA CCGCCTCTCC 7150 



CCGCGCXTTTG GCCGATTCAT TAATCCAGCT GGCACXSACAG OTTTCCCGAC 7200 
TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT ACCTCACTCA 7250 
TTAGGCACCC CAGGCTTTAC ACTTTATGCT TCCG6CTCGT ATGTTGTGTG 7300 
GAATTGTGAG CGGATAACAA TTTCAGACAG GAAACAGCTA TGACCAT6AT 7350 
TACGAATTAA 7360 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6889 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : doiable 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTGACCGC CCAACGACCC CCGCCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGAC6C 550 
AAATGGGCGG TAGGCGTGTA CGGTGGGAGG TCTATATAAG CAGAGCTC6T 600 
TTAGTGAACC GTCAGATCGC CTGGAGACGC CATCCACGCT GTTTTGACCT 650 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 
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TTGGAACGCG GATTCCCCGT 6CCAAGA6TG 
GCGATAAGAG GATTTTATCC CCGCTGCCAT 
GCATCX3TCGC GGTGTCCaiA AATATGGGGA 
CCCTGCCCTC CGCTCAGGAA C6CGTTCAAG 
AACCTCTTCA GTG6AAGGTA AA.CAGAATCT 
CCTGGTTCTC CATTCCTGAG AAGAATCGAC 
ATAGTTCTCA GTAGAGAACT CAAAGAACCA 
TGCCAAAAGT TTGGATGATG CCTTAAGACT 
CAAGTAAAGT AGACATGGTT TGGATAGTCG 
GAAGCCATGA ATCAACCAGG CCACCTTAGA 
GCAGGAATTT GAAAGTGACA CGTTTTTCCC 
ATAAACCTCT CCCAGAATAC CCAGGCGTCC 
AAAGGCATCA AGTATAAGTT TGAAGTCTAC 
AGATGCTTTC AAGTTCTCTG CTCCCCTCCT 
GACCATGGGA CTTTTGCTGG CTTTAGACCC 
GGCTACAATT AATACATAAC CTTATGTATX: 
CACTATAGAA TAACATCCAC TTTGCCTTTC 
GGTCAACTGC ACCTCGGTTC TATCGATTGA 
TGGCATGGGC CTCTCCACCG TGCCTGACCT 
TGGAGCTGTT GGTGGGAATA TACCCCTCAG 
CACCTAGGGG ACAGGGAGAA GAGAGATAGT 
TATCCACCCT CAAAATAATT CGATTTGCTG 

cctXcttgta caatgactgt ccaggcx:cgg 



CTGTAAGTAC CGCCTATAGA 750 
CATGGTTCGA CCATTGAACT 800 
TTGGCAAGAA CGGAGACCTA 650 
TACTTCCAAA GAATGACCAC 900 
GGTGATTATG GGTAGGAAAA 950 
CTTTAAAGGA CAGAATTAAT 1000 
CCACGAGGAG CTCATTTTCT 1050 
TATTGAACAA CCX3GAATTGG 1100 
GAGGCAGTTC TGTTTACCAG 1150 
CTCTTTGTGA CAAGGATCAT 1200 
AGAAATTGAT TTGGGGAAAT 1250 
TCTCTGAGGT CCAGGAGGAA 1300 
GAGAAGAAAG ACTAACAGGA 1350 
AAAGCTATGC ATTTTTATAA 1400 
CCTTGGCTTC GTTAGAACGC 1450 
ATACACATAG ATTTAGGTGA 1500 
TCTCCACAGG TGTCACTCCA 1550 
ATTCCCCGGC CATAGCTGTC 1600 
GCTGCTGCCG CTGGTGCTCC 1650 
GGGTTATTGG ACTGGTCCCT 1700 
GTGTGTCCCC AAGGAAAATA 1750 
TACCAA6TGC CACAAAGGAA 1800 
GGCAGGATAC GGACTGCAGG 1850 
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GAGTGT6AGA GCGGCTCCTT OICCGCTTCA GAAAACCACC TCAGACACT6 1900 



CCTCAGCTGC TCCAAATGCC GAAAGGAAAT GGGTCAGGTG GAGATCTCTT 1950 

5 

CTTGCACAGT GGACCGGGAC ACCGTGTGTG GCTGCAGGAA GAACCAGTAC 2000 



10 CGGCATTATT G6AGTGAAAA CCTTTTCCAG TGCTTCAATT GCAGCCTCTG 2050 
CCTCAATGGG ACCGTGCACC TCTCCTGCCA GGAGAAACAG AACACCGTGT 2100 

15 

GCACCTGCCA TGCAGGTTTC TTTCTAAGAG AAAACGAGTG TGTCTCCTGT 2150 
AGTAACTGTA AGAAAAGCCT GGAGTGCACG AAGTTGTGCC TACCCCAGAT 2200 

20 

TGAGAAT6TT AAGGGCACTG AGGACTCAGG CACCACAGAC AAGAGAGTTG 2250 



25 AGCTCAAAAC CCCACTTGGT GACACAACTC ACACATGCCC ACGGTGCCCA 2300 
GAGCCCAAAT CTTGTGACAC ACCTCCCCCG TGCCCACGGT GCCCAGAGCC 2350 

30 

CAAATCTTGT GACACACCTC CCCCATGCCC ACGGTGCCCA GAGCCCAAAT 2400 
CTTGTGACAC ACCTCCCCCA TGCCCACGGT GCCCAGCACC TGAACTCCTG 2450 

35 

GGAGGACCGT CAGTCTTCCT CTTCCCCCCA AAACCCAAGG ATACCCTTAT 2500 
40 GATTTCCCGG ACCCCT6AGG TCACGTGCGT GGTGGTGGAC GTGAGCCACG 2550 
AAGACCCCGA GGTCCAGTTC AAGTGGTACG TGGACGGCGT GGAGGTGCAT 2600 

45 

AATGCCAAGA CAAAGCCGCG GGAGGAGCAG TTCAACAGCA CGTTCCGTGT 2650 
GGTCAGCGTC CTCACCGTCC TGCACCAGGA CTGGCTGAAC GGCAAGGAGT 2700 

50 

ACAAGTGCAA GGTCTCCAAC AAAGCCCTCC CAGCCCCCAT CGAGAAAACC 2750 



55 ATCTCCAAAA CCAAAGGACA GCCCCGAGAA CCACAGGTGT ACACCCTGCC 2600 
CCCATCCCGG GAGGAGATGA CCAAGAACCA GGTCAGCCTG ACCTGCCTGG 2850 

60 

TCAAAGGCTT CTACCCCAGC GACATCGCCG TGGAGTGGGA GAGCAGCGGG 2900 
CAGCCGGAGA ACAACTACAA CACCACGCCT CCCATGCTGG ACTCCGACGG 2950 

65 

CTCCTTCTTC CTCTACAGCA AGCTCACCGT GGACAAGAGC AGGTGGCAGC 3000 
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AGGGGAACAT CTTCTCATGC TCCGTGATGC ATGAGGCTCT GCACAACCGC 3050 



TTCACGCAGA AGAGCCTCTC CCTGTCTCCQ GGTAAATGAG TGCGACGGCC 3100 

5 

GGGGATCCTC TAGAGTC6AC CTGCAGAAGC TTGGCCGCCA TG6CCCAACT 3150 
10 TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 3200 
TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA 3250 

15 

ACTCATCAAT GTATCTTATC ATGTCTGGAT CGATCGGGAA TTAATTCGGC 3300 
GCAGCACCAT GGCCTGAAAT AACCTCTGAA AGAGGAACTT GGTTAGGTAC 3350 

20 

CTTCTGAGGC GGAAAGAACC AGCTGTGGAA TGTGTGTCAG TTAGGGTGTG 3400 
25 GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3450 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAG6CAG 3500 

30 

AAGTATGCAA AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 3550 
TAACTCCGCC CATCCCGCCC CTAACTCCGC CCAGTTCCGC CCATTCTCCG 3600 

35 

CCCCATGGCT GACTAATTTT TTTTATTTAT GCAGAGGCCG AGGCCGCCTC 3650 
40 GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT GGAGGCCTAG 3700 
GCTTTTGCAA AAAGCTGTTA ACAGCTTGGC ACTGGCCX5TC GTTTTACAAC 3750 

45 

GTCGTGACTG GGAAAACCCT GGCGTTACCC AACTTAATCG CCTTGCAGCA 3800 
CATCCCCCCT TCGCCAGCTG GCGTAATAGC GAAGAGGCCC GCACCGATCG 3850 

50 

CCCTTCCCAA CAGTTGCX3TA GCCTGAATGG CGAATGGCGC CTGATGCGGT 3900 
55 ATTTTCTCCT TACGCATCTG TGCGGTATTT CACACCGCAT ACGTCAAAGC 3950 
AACCATAGTA CGCGCCCTGT A6CGGCGCAT TAAGCGCGGC GGGTGTGGTG 4000 

60 

GTTACGCGCA GCGTGACCX^C TACACTTGCC AGCGCCCTAG CX3CCCGC?TCC 4050 
TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC TTTCCCCGTC 4100 

65 

AAGCTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG TGCTTTACXIG 4150 
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CACCTCGACC CCAAAAAACT TGATTTGGGT 6AT66TTCAC GTAGTGGGCC 4200 
ATCGCCCTGA TAGACGGTTT TTC6CCCTTT GAC6TTGGAG TCCACGTTCT 4250 
TTAATAGTGG ACTCTTGTTC CAAACTGGAA CAACACTCAA CCCTATCTCG 4300 
GGCTATTCTT TTGATTTATA AGGGATTTTG CCGATTTOGG CCTATTGGTT 4350 
AAAAAATGAG CTGATTTAAC AAAAATTTAA CGCGAATTTT AACAAAATAT 4400 
TAACGTTTAC AATTTTATGG TGCACTCTCA GTACAATCTG CTCTGATGCC 4450 
GCATAGTTAA GCCAACTCCG CTATCGCTAC GTGACTGGGT CATGGCTGCG 4500 
CCCCGACACC CGCCAACACC CGCT6ACX5CG CCCTGACGGG CTTGTCTGCT 4550 
CCCGGCATCC GCTTACAGAC AAGCTGTGAC CGTCTCCGGG AGCTGCATGT 4600 
GTCAGAGGTT TTCACCGTCA TCACCGAAAC GCGCGAGGCA GTATTCTTGA 4650 
AGACGAAAGG GCCTCGTGAT ACGCCTATTT TTATAGGTTA ATGTCATGAT 4700 
AATAATGGTT TCTTAGACGT CAGGTGGCAC TTTTCGGGGA AATGTGCGCG 4750 
GAACCCCTAT TTGTTTATTT TTCTAAATAC ATTCAAATAT GTATCCGCTC 4600 
ATGAGACAAT AACCCTGATA AATGCTTCAA TAATATTGAA AAAGGAAGAG 4850 
TATGAGTATT CAACATTTCC GTGTCGCCCT TATTCCCTTT TTTGCGGCAT 4900 
TTTGCCTTCC TGTTTTTGCT CACCCAGAAA CGCTGGTGAA AGTAAAAGAT 4950 
GCTGAAGATC AGTTGGGTGC ACGAGTGGGT TACATCGAAC TGGATCTCAA 5000 
CAGCGGTAAG ATCCTTGAGA G TTTTCGCCC CGAAGAACGT TTTCCAATGA 5050 
TGAGCACTTT TAAAGTTCTG CTATGTGGCG CGGTATTATC CCGTGATGAC 5100 
GCCGGGCAAG AGCAACTCGG TCGCCGCATA CACTATTCTC AGAATGACTT 5150 
GGTTGAGTAC TCACCAGTCA CAGAAAAGCA TCTTACGGAT GGCATGACAG 5200 
TAAGAGAATT ATGCAGTGCT GCCATAACCA TGAGTGATAA CACTGCGGCC 5250 
AACTTACTTC TGACAACGAT CGGAGGACCTG AAGGAGCTAA CCGCTTTTTT 5300 
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GCACAACATG GGGGATCATG TAACTCXKrCT TGATC6TTGG GAACCGGAGC 5350 
TGAATGAAGC CATACCAAAC GACGAGCX3TG ACACCACGAT GCCAGCAGCA 5400 

5 

ATGGCAACAA CGTTGCGCAA ACTATTAACT GGCGAACTAC TTACTCTAGC 5450 
10 TTCCCGGCAA CAATTAATAG ACTGGATGGA GGCGGATAAA GTTGCAGGAC 5500 
CACTTCTGCG CTCGGCCCTT CCGGCTGGCT GGTTTATTGC TGATAAATCT 5550 

15 

GGAGCCGGTG AGCGTGGGTC TCGCGGTATC ATTGCAGCAC TGGGGCCAGA 5600 
TGGTAAGCCC TCCCGTATCG TAGTTATCTA CACGACGGGG AGTCAGGCAA 5650 

20 

CTATGGATGA ACGAAATAGA CAGATCX3CTG AGATAGGTGC CTCACTGATT 5700 
25 AAGCATTGGT AACTGTCAGA CCAAGTTTAC TCATATATAC TTTAGATTGA 5750 
TTTAAAACTT CATTTTTAAT TTAAAAGGAT CTAGGTGAAG ATCCTTTTTG 5800 

30 

ATAATCTCAT GACCAAAATC CCTTAACGTG AGTTTTCGTT CCACTGAGCG 5850 
TCAGACCCCG TAGAAAAGAT CAAAGGATCT TCTTOAGATC CTTTTTTTCT 5900 

35 

GCGCGTAATC T6CTGCTTGC AAACAAAAAA ACCACCGCTA CCAGCGGTGG 5950 
40 TTTGTTTGCC GGATCAAGAG CTACCAACTC TTTTTCCGAA GGTAACTGGC 6000 
TTCAGCAGAG CGCAGATACC AAATACTGTC CTTCTAGTGT AGCCGTAGTT 6050 

45 

AGGCCACCAC TTCAAGAACT CTGTAGCACC 6CCTACATAC CTCGCTCTGC 6100 
TAATCCTGTT ACCAGTGGCT GCTGCCAGTG GCGATAAGTC GTGTCTTACC 6150 

50 

GGGTTGGACT CAAGACX3ATA GTTACCGGAT AAGGCGCAGC GGTCGGGCTG 6200 
55 AACGGGGGGT TCGTGCACAC AGCCCAGCTT GGAGCGAACG ACCTACACCG 6250 
AACTGAGATA CCTACAGCX3T GAGCATTGAG AAAGCGCCAC GCTTCCCGAA 6300 

60 

GGGAGAAAGG CGGACAGGTA TCCGGTAAGC GGCAGGGTCG GAACAGGAGA 6350 
GCGCACGAGG GAGCTTCCAG GGGGAAACGC CTGGTATCTT TATAGTCCT6 6400 

65 

TCGGGTTTCG CCACCTCTGA CTTGAGOGTC GATTTTTGTG ATGCTCGTCA 6450 
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GGGGG6C6GA GCCTATGGAA AAACGCCAGC AAC6CGGCCT TTTTAC6GTT 6500 
CCTGGCCTTT TGCTGCaTCTT TTGCTCACAT GTTCTTTCCT GCGTTATCCC 6550 
CTGATTC7GT GGATAACCGT ATTACCGCCT TTGAGTGA6C TGATACCX3CT 6600 
CGCCGCAGCC GAAC6ACCGA GCGCAGCGAG TCA6TGAGCG AGGAAGCGGA 6650 
AGAGCGCCCA ATACX3CAAAC CGCCTCTCCC C6C6CGTTG6 CCGATTCATT 6700 
AATCCAGCTG GCACGACAGG TTTCCCGACT GGAAAGCGGG CAGTGAGCGC 6750 
AACGCAATTA ATGTGAGTTA CCTCACTCAT TAGGCACCCC AGGCTTTACA 6800 
CTTTATGCTT CCX3GCTCGTA TGTTGTGTGG AATTGTGAGC GGATAAGAAT 6850 
TTCACACAGG AAACAGCTAT GACCATGATT ACGAATTAA 6869 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 6557 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTCGAGCTCG CCC6ACATTG ATTATTGACT AGAGTCGATC GACAGCTGTG 50 
GAATGTGTGT CAGTTAGGGT GTGGAAAGTC CCCAGGCTCC CCAGCAGGCA 100 
GAAGTAT6CA AAGCATGCAT CTCAATTAGT CAGCAACCAG GT6TGGAAAG 150 
TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC ATCTCAATTA 200 
GTCAGCAACC ATAGTCCCGC CCCTAACTCC GCCCATCCCG CCCCTAACTC 250 
CGCCCAGTTC CGCCCATTCT CCGCCCCATG GCTGACTAAT TTTTTTTATT 300 
TATGCAGAGG CCGAGGCCGC CTCGGCCTCT GAGCTATTCC AGAAGTAGTG 350 
AGGAGGCTTT TTTGGAGGCC TAGGCTTTTG CAAAAAGCTA GCTTATCCGG 400 
CCGGGAACGG TGCATTGGAA CGCGGATTCC CCGTGCCAAG AGTGACGTAA 450 
GTACCGCCTA TA6AGCGATA AGAGGATTTT ATCCCCGCTG CCATCATGGT 500 
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TCGACCATT6 AACTGCATC6 TCGCC6TGTC CCAAAATATG GGGATTGGCA 550 
AGAACG6AGA CCTACCCTG6 CCTCCGCTCA GGAACGA6TT CAAGTACTTC 600 

5 

CAAAGAATGA CCACAACCTC TTCAGT66AA GGTAAACAGA ATCTGGT6AT 650 
10 TATGGGTAG6 AAAACCTGGT TCTCCATTCC TGAGAA6AAT GGACCTTTAA 700 
AGGACAGAAT TAATATAGTT CTCAGTAGAG AACTCAAAGA ACCACCACGA 750 

15 

GGAGCTCATT TTCTTGCCAA AAGTTTGGAT GATGCCTTAA GACTTATTGA 600 
ACAACCGGAA TTGGCAAGTA AAGTAGACAT GGTTTGGATA GTCGGAGGCA 650 

20 

GTTCTGTTTA CCAGGAAGCC ATGAATCAAC CAGGCCACCT TAGACTCTTT 900 
25 GTGACAAGGA TCATGCAGGA ATTTGAAAGT GACACGTTTT TCCCAGAAAT 950 
TGATTTGGGG AAATATAAAC CTCTCCCAGA ATACCCAGGC GTCCTCTCTG 1000 

30 

AGGTCCAGGA GGAAAAAGGC ATCAAGTATA AGTTTGAAGT CTACGAGAAG 1050 
AAAGACTAAC AGGAAGATGC TTTCAAGTTC TCTGCTCCCC TCCTAAAGCT 1100 

35 

ATGCATTTTT ATAAGACCAT GGGACTTTTG CTGGCTTTAG ATCCCCTTGG 1150 
40 CTTCGTTAGA ACGCAGCTAC AATTAATACA TAACCTTATG TATCATACAC 1200 
ATACGATTTA GGTGACACTA TAGATAACAT CCACTTTGCC TTTCTCTCCA 1250 

45 

CAGGTGTCCA CTCCCAGGTC CAACTGCACC TCGGTTCTAT CX3ATTGAATT 1300 
CCACCATGGG ATGGTCATGT ATCATCCTTT TTCTAGTAGC AACTGCAACT 1350 

50 

GGAGTACATT CAGAAGTTCA GCTGGTGGAG TCTGGCGGTG GCCTGGTGCA 1400 
55 GCCAGGGGGC TCACTCCGTT TGTCCTGTGC AGTTTCTGGC TACTCCATCA 1450 
CCTCCGGATA TAGCT6GAAC TGGATCCX3TC AGGCCCCGGG TAAGGGCCTG 1500 

60 

GAATGGGTTG CATCGATTAC GTATGCCGGA TCGACTAACT ATAACCCTAG 1550 
CGTCAAGGGC CGTATCACTA TAAGTCGCGA CGATTCCAAA AACACATTCT 1600 

65 

ACCTGCAGAT GAACAGCCTG CGTGCTGAGG ACACTGCCGT CTATTATTGT 1650 
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GCTCQAQGCA QCCACTATTT CGGCGCCTGG CACTTCXKZCQ T6TGGGGTCA 1700 
AGGAACCCTG GTCACCGTCT CCTCGGCCTC CACCAAGGGC CCATCGGTCT 1750 

5 

TCCCCCTGGC ACCCTCCTCC AAGAGCACCT CTGGGGGCAC AGCGGCCCTG 1800 
10 GGCTGCCTGG TCAAGGACTA CTTCCCCGAA CCG6TGACGG TGTCGTGGAA 1850 
CTCAGGCGCC CTGACCAGCG GCX3TGCACAC CTTCCCGGCT GTCCTACAGT 1900 

15 

CCTCAGGACT CTACTCCCTC AGCAGCGTGG TGACTGTGCC CTCTAGCAGC 1950 
TTGGGCACCC AGACCTACAT CTGCAACQTG AATCACAAGC CCAGCAACAC 2000 

20 

CAAGGTGGAC AAGAAAGTTG AGCCCAAATC TTGTGACAAA ACTCACACAT 2050 
25 GCCCACCGTG CCCAGCACCT GAACTCCTGG GGGGACCGTC AGTCTTCCTC 2100 
TTCCCCCCAA AACCCAAGGA CACCXnfCATG ATCTCCCGGA CCCCTGAGGT 2150 

30 

CACATGCGTG GTGGTGGACG TGAGCCACGA AGACCCT6AG GTCAAGTTCA 2200 
ACTGGTACGT GGACGGC6TG GAGGTGCATA ATGCCAAGAC AAAGCCGCGG 2250 

35 

GAGGAGCAGT ACAACAGCAC GTACCGTGTG GTCAGCGTCC TCACCGTCCT 2300 
40 GCACCAGGAC TGGCTGAATG GCAAGGAGTA CAA6TGCAAG GTCTCCAACA 2350 
AAGCCCTCCC AGCCCCCATC GAGAAAACCA TCTCCAAAGC CAAAGGGCAG 2400 

45 

CCCCGAGAAC CACAGGTGTA CACCCTGCCC CCATCCCGGG AAGAGATGAC 2450 
CAAGAACCAG GTCAGCCTGA CCTGCCTGGT CAAAGGCTTC TATCCCAGCG 2500 

50 

ACATCGCCGT GGAGTGGGAG AGCAATGGGC AGCCGGAGAA CAACTACAAG 2550 
55 ACCACX5CCTC CCGTGCTGGA CTCCGACGGC TCCTTCTTCC TCTACAGCAA 2600 
GCTCACCGTG GACAAGAGCA GGTGGCAGCA GGGGAACGTC TTCTCATGCT 2650 

60 

CCGTGATGCA TGAGGCTCTG CACAACCACT ACACGCAGAA GAGCCTCTCC 2700 
CTGTCTCCGG GTAAATGAGT GCGACGGCCC TAGAGTCGAC CTGCAGAAGC 2750 

65 

TT6GCX:GCCA TGGCCCAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT 2800 
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AAAGCAATAG CATCACAAAT TTCACAAATA AAGCATTTTT TTCACTGCAT 2850 
TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC ATGTCTGGAT 2900 

5 

CXtATCGGGAA TTAATTCGGC GCAGCACCAT GGCCTGAAAT AACCTCTGAA 2950 
10 AGAGGAACTT GGTTAGGTAC CTTCTGAGGC GGAAAGAACC AGCTGTGGAA 3000 
TGTGTGTCAG TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 3050 
GTATGCAAAG CATQCATCTC AATTAGTCAG CAACCAGGTG TGGAAAGTCC 3100 
CCAGGCTCCC CAGCAGGCAG AAGTATGCAA AGCATGCATC TCAATTAGTC 3150 
AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC CTAACTCXTGC 3200 
25 CCAGTTCCGC CCATTCTCC6 CCCCATGGCT GACTAATTTT TTTTATTTAT 3250 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG 3300 
AGGCTTTTTT GGAGGCCTAG GCTTTTGCAA AAAGCTGTTA CCTCGAGCGG 3350 
CCGCTTAATT AAGGCGCGCC ATTTAAATCC TGCAGGTAAC AGCTTGGCAC 3400 
TGGCCGTCGT TTTACAACGT CGTGACTGGG AAAACCCTGG CGTTACCCAA 3450 
40 CTTAATCGCC TTGCAGCACA TCCCCCCTTC GCCAGCTGGC GTAATAGCGA 3500 
AGAGGCCCGC ACCGATCGCC CTTCCCAACA GTTGCGTAGC CTGAATGGCG 3550 
AATGGCGCCT GATGCGGTAT TTTCTCCTTA CX3CATCTGT0 CGGTATTTCA 3600 
CACCGCATAC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA 3650 
AGCGCGGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG 3700 
55 CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT 3750 
TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC TTTAGGGTTC 3800 
CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG ATTTGGGTGA 3850 
TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 3900 
CGTTGGAGTC CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA 3950 
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ACACTCAACC CTATCTCGGG CTATTCTTTT GATTTATAAG GGATTTTGCC 4000 



GATTTC6GCC TATTGGTTAA AAAATGA6CT GATTTAACAA AAATTTAAC6 4050 

5 

CX3AATTTTAA CAAAATATTA ACGTTTACAA TTTTATG6TG CACTCTCAGT 4100 
10 ACAATCTGCT CT6ATGCCGC ATAGTTAA6C CAACTCCGCT ATCGCTACGT 4150 
GACTGGGTCA TG6CTGC6CC CCGACACCCG CCAACACCCG CTGACXTCGCX: 4200 

15 

CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACC6 4250 
TCTCXX3GGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC 4300 

20 

GCGAGGCAGT ATTCTTGAAG ACGAAAGGGC CTCGTGATAC GCCTATTTTT 4350 



25 ATAGGTTAAT GTCATGATAA TAATGGTTTC TTAGACGTCA GGTGGCACTT 4400 
TTCGGGGAAA TGTGCGCGGA ACCCCTATTT GTTTATTTTT CTAAATACAT 4450 

30 

TCAAATATGT ATCCGCTCAT GAGACAATAA CCCTGATAAA TGCTTCAATA 4500 
ATATTGAAAA AGGAAGAGTA TGAGTATTCA AGATTTCCGT GTCGCCCTTA 4550 

35 

TTCCCTTTTT TGCGGCATTT TGCCTTCCTG TTTTTGCTCA CCCAGAAACG 4600 
40 CTGGTGAAAG TAAAAGATGC TGAAGATCAG TTGGGTGCAC GAGTCGGITA 4650 
CATCGAACTG GATCTCAACA GCGGTAAGAT CCTTGAGAGT TTTCGCCCCG 4700 

45 

AAGAACGTTT TCCAATGATG AGCACTTTTA AAGTTCTGCT ATGTGGCGGG 4750 
GTATTATCCC GTGATGACGC CGGGCAAGAG CAACTCGGTC GCCGCATACA 4800 

50 

CTATTCTCAG AATGACTTGG TTGAGTACTC ACCAGTCACA GAAAAGCATC 4850 



55 TTACGGATGG CATGACAGTA AGAGAATTAT GCAGTGCTGC CATAACCATG 4900 
AGTGATAACA CTGCGGCCAA CTTACTTCTG ACAACGATCG GAGGACCGAA 4950 

60 

GGAGCTAACC GCTTTTTTGC ACAACATGGG GGATCATGTA ACTCGCCTTG 5000 
ATCGTTGGGA ACCGGAGCTG AATGAAGCCA TACCAAACGA CGAGCGTGAC 5050 

65 

ACCACGATGC CAGGAGCAAT GGCAACAACG TTGCGCAAAC TATTAACTGG 5100 
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CGAACTACTT ACTCTAGCTT CCCGGCAACA ATTAATAGAC T6GATGGAGG 5150 



CGGATAAAGT TGCAGGACCA CTTCTGCGCT CGGCCCTTCC GGCTGGCTGG 5200 

5 

TTTATTGCTG ATAAATCTQG AGCGGGTQAG CGTGGGTCTC GCGGTATCAT 5250 



10 TGCAGGACTG GGGCCAGATG GTAAGCCCTC CCGTATCGTA GTTATCTACA 5300 
CGACGGGGAG TCAGGCAACT ATGGATGAAC GAAATAGACA GATCGCTGAG 5350 

15 

ATA6GTGCCT CACTGATTAA GCATTGGTAA CTGTCAGACC AAGTTTACTC 5400 
ATATATACTT TAGATTGATT TAAAACTTCA TTTTTAATTT AAAAGGATCT 5450 

20 

AGGTGAAGAT CCTTTTTGAT AATCTCATGA CCAAAATCCC TTAACGTGAG 5500 



25 TTTTCGTTCC ACTGAGCGTC AGACCCCGTA GAAAAGATCA AAGGATCTTC 5550 
TTGAGATCCT TTTTTTCTGC GCGTAATCTG CTGCTTGCAA ACAAAAAAAC 5600 

30 

CACCGCTACC AGCGGTGGTT TGTTTGCCGG ATCAAGAGCT ACCAACTCTT 5650 
TTTCCGAAGG TAACTGGCTT CAGCAGAGCG CAGATACCAA ATACTGTCCT 5700 

35 

TCTAGTGTAG CCGTAGTTAG GCCACCACTT CAAGAACTCT GTAGCACCGC 5750 



40 CTACATACCT CGCTCTGCTA ATCCTGTTAC CAGTGGCTGC TGCCAGTGGC 5800 
GATAAGTCGT GTCTTACCGG GTTGGACTCA AGACGATAGT TACCGGATAA 5850 

45 

GGCGCAGCGG TCGGGCTGAA CGGGGGGTTC GTGCACACAG CCCAGCTTGG 5900 
AGCGAACGAC CTACACCGAA CTGAGATACC TACAGCX3TGA GCATTGAGAA 5950 

50 

AGCGCCACGC TTCCCGAAGG GAGAAAGGCG GACAGGTATC CGGTAAGCGG 6000 
55 CAGGGTCGGA ACAGGAGAGC GCACGAGGGA GCTTCCAGGG GGAAACGCCT 6050 
GGTATCTTTA TAGTCCTGTC GGGTTTCGCC ACCTCTGACT TGAGCGTCGA 6100 

60 

T T TTT G TGAT GCTCGTCAGG GGGGCGGAGC CTATGGAAAA ACGCCAGCAA 6150 
CGCGGCCTTT TTACGGTTCC TGGCCTTTTG CTGGCCTTTT GCTCACATGT 6200 

65 

TCTTTCCTGC GTTATCCCCT GATTCTGTGG ATAACCGTAT TACCGCCTTT 6250 
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GAGTGA6CTG ATACCGCTC6 CC6CAGCC6A ACGACC6AGC 6CAGCGAGTC 6300 
AGTGAGCGAG GAA6CGGAAG AGCGCCCAAT ACGCAAACCG CCTCTCCCCG 6350 
CGCJGTTGGCC GATTCATTAA TCCAGCTGGC ACGACAGGTT TCCCGACTGG 6400 
AAA6CGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTACC TCACTCATTA 6450 
GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGT6GAA 6500 
TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTAT6A CCATGATTAC 6550 
GAATTAA 6557 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7305 bases 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : do\2ble 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TTCGAGCTCG CCCGACATTG ATTATTGACT AGTTATTAAT AGTAATCAAT 50 
TACGGGGTCA TTAGTTCATA GCCCATATAT GGAGTTCCGC GTTACATAAC 100 
TTACGGTAAA TGGCCCGCCT GGCTQACCGC CCAACGACCC CC6CCCATTG 150 
ACGTCAATAA TGACGTATGT TCCCATAGTA ACGCCAATAG GGACTTTCCA 200 
TTGACGTCAA TGGGTGGAGT ATTTACGGTA AACTGCCCAC TTGGCAGTAC 250 
ATCAAGTGTA TCATATGCCA AGTACGCCCC CTATTGACGT CAATGACGGT 300 
AAATGGCCCG CCTGGCATTA TGCCCAGTAC ATGACCTTAT GGGACTTTCC 350 
TACTTGGCAG TACATCTACG TATTAGTCAT CGCTATTACC ATGGTGATGC 400 
GGTTTTGGCA GTACATCAAT GGGCGTGGAT AGCGGTTTGA CTCACGGGGA 450 
TTTCCAAGTC TCCACCCCAT TGACGTCAAT GGGAGTTTGT TTTGGCACCA 500 
AAATCAACGG GACTTTCCAA AATGTCGTAA CAACTCCGCC CCATTGACGC 550 
AAATGGGCGG TAGGC6TGTA CGGTGGGAGG TCTATATAAG CAGAGCTCGT 600 
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TTAGTGAACC GTCAGATCGC CTGGAQACGC CATCCACGCT GTTTTGACCT €50 
CCATAGAAGA CACCGGGACC GATCCAGCCT CCGCGGCCGG GAACGGTGCA 700 

5 

TTGGAACGCG GATTCCCCGT GCCAAGAGT6 ACGTAAGTAC CGCCTATAGA 750 
10 GTCTATAGGC CCACCCCCTT GGCnCGTTA GAACGCGGCT ACAATTAATA 800 
CATAACCTTA TGTATCATAC ACATACGATT TAGGTGACAC TATAGAATAA 850 
CATCCACTTT GCCTTTCTCT CCACAGGTQT CCACTCCCAG GTCCAACTGC 900 
ACCTCGGTTC TAAGCTTATC GATATGAAAA AGCCTGAACT CACCXKX3ACG 950 
TCTGTCGAGA AGTTTCTGAT CGAAAAGTTC GACAGCGTCT CCGACCTGAT 1000 
25 GCAGCTCTCG GAGGGCGAAG AATCTCGTGC TTTCAGCTTC GATGTAGGAG 1050 
GGCGTGGATA TGTCCTGCGG GTAAATAGCT GCX3CCGATGG TTTCTACAAA 1100 
GATCXHTATG TTTATCGGCA CTTTGCATCG GCCGCGCTCC CGATTCCGGA 1150 
AGTGCTTGAC ATTGGGGAAT TCAGCGAGAG CCTGACCTAT TGCATCTCCC 1200 
GCCX5TGCACA GGGTGTCACG TTGCAACACC TGCCTGAAAC CGAACTGCCC 1250 
40 GCTGTTCTGC AGCCGGTCGC GGAGGCCATG GATGCGATCG CTGCGGCOGA 1300 
TCTTAGCCAG ACGAGCGGGT TCGGCCCATT CGGACCGCAA GGAATCGGTC 1350 
AATACACTAC ATG6CGTGAT TTCATATGCG CGATTGCTGA TCCCCATGTG 1400 
TATCACTGGC AAACTGTGAT GGACGACACC GTCAGTGCGT CCGTCGCGCA 1450 
GGCTCTCGAT GAGCTGATGC TTTGGGCCGA GGACTGCCCC GAAGTCCGGC 1500 
55 ACCTCGTGCA CGCGGATTTC GGCTCCAACA ATGTCCTGAC GGACAATGGC 1550 
CGCATAACAG CGGTCATTGA CTGGAGCGAG GCGATGTTCG GGGATTCCCA 1600 
ATACGAGGTC GCCAACATCT TCTTCTGGAG GCCGTGGTTG GCTTGTATGG 1650 
AGCAGCAGAC GTACTTCGAG CGGAGGCATC CGGAGCTTGC AGGATCGCCG 1700 
CGGCTCCGGG CGTATATGCT CCGCATTGGT CTTGACCAAC TCTATCAGAG 1750 
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CTTG6TTGAC GGCAATTTCG ATGATGCAGC TTGG6CGCAG GGTCGATGCG 1800 



ACGCAATCGT CXX3ATCCGGA GCCGGGACTG TCGGGOGTAC ACAAATCGCC 1850 

5 

CGCAGAAGCG CGGCCX5TCTG GACCGATGGC TGTGTAGAAG TACTCGCCGA 1900 



10 TAGTGGAAAC CGACGCCCCA GCACTCGTCC GAGGGCAAAG GAATAGAGTA 1950 
GATGCCGACC GAAGGATCCC CGGGGAATTC AATCGATGGC CX3CCAT6GCC 2000 

15 

CAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC AATAGCATCA 2050 
CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 2100 

20 

TCCAAACTCA TCAATGTATC TTATCAT6TC TGGATCGATC GGGAATTAAT 2150 



25 TCGGCGCAGC ACCATGGCCT GAAATAACCT CTGAAAGAGG AACTTGGTTA 2200 
GGTACCTTCT GAGGCGGAAA GAACCAGCT6 TGGAATGTGT GTCAGTTAGG 2250 

30 

GTGTGGAAAG TCCCCAGGCT CCCCAGCAGG CAGAAGTATG CAAAGCATGC 2300 
ATCTCAATTA 6TCAGCAACC AGGTGTGGAA AGTCCCCAGG CTCCCCAGCA 2350 

35 

GGCAGAAGTA TGCAAAGCAT GCATCTCAAT TAGTCAGCAA CCATAGTCCC 2400 
40 GCCCCTAACT CCGCCCATCC CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2450 
CTCCGCCCCA TGGCTGACTA ATTTTTTTTA TTTATGCAGA GGCCGAGGCC 2500 

45 

GCCTCGGCCT CTGAGCTATT CCAGAAGTAG TGAGGAGGCT TTTTTGGAGG 2550 
CCTAGGCTTT TGCAAAAAGC TAGCTTATCC GGCCGGGAAC GGTGCATTGG 2600 

50 

AACGCGGATT CCCCGTGCCA AGAGTCAGGT AAGTACCGCC TATAGAGTCT 2650 



55 ATAGGCCCAC CCCCTTGGCT TCGTTAGAAC GCGGCTACAA TTAATACATA 2700 
ACCTTTTGGA TCGATCCTAC TGACACTGAC ATCCACTTTT TCTTTTTCTC 2750 

60 

CACAGGTGTC CACTCCCAGG TCCAACTGCA CCTCGGTTCG CGAAGCTAGC 2800 
TTGGGCTGCA TCX5ATTGAAT TCCACCATGG GATGGTCATG TATCATCCTT 2850 

65 

TTTCTAGTAG CAACTGCAAC TGGAGTACAT TCAGATATCC AGCTGACCCA 2900 
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GTCCCCGAGC TCCCTGTCCG CCTCTGTGGO CGATAGGGTC ACCATCACCT 2950 
GCCX3TGCCAG TCAGAGC6TC GATTACGATG GTGATAGCTA CATGAACTGG 3000 

5 

TATCAACAGA AACCAGGAAA AGCTCCGAAA CTACTGATTT ACGCGGCCTC 3050 
10 GTACCTGGAG TCTGGAGTCC CTTCTCGCTT CTCTGGATCC GGTTCTGGGA 3100 
CGGATTTCAC TCTGACCATC AGCAGTCTGC AGCCGGAAGA CTTCGCAACT 3150 

15 

TATTACTGTC AGCAAAGTCA CGAGGATCCG TACACATTTG GACAGGGTAC 3200 
CAAGGTGGAG ATCAAACGAA CTGTGGCTGC ACCATCTGTC TTCATCTTCC 3250 

20 

CGCCATCTGA TGAGCAGTTG AAATCTGQAA CTGCCTCTQT TGTGTGCCTG 3300 
25 CTGAATAACT TCTATCCCAG AGAGGCCAAA GTACAGTGGA AGGTGGATAA 3350 
CGCX:CTCCAA TCGGGTAACT CCCAGGAGAG TGTCACAGAG CAGGACAGCA 3400 

30 

AGGACAGCAC CTACAGCCTC AGCAGCACCC TGACGCTGAG CAAAGCAGAC 3450 
TACGAGAAAC ACAAAGTCTA CGCCTGCGAA GTCACCCATC AGGGCCTGAG 3500 

35 

CTCGCCCGTC ACAAAGAGCT TCAACAGGGG AGAGTGTTAA GCTTCGATGG 3550 
40 CCGCCATGGC CCAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG 3600 
CAATAGCATC ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 3650 

45 

GTTGTGGTTT GTCCAAACTC ATCAATGTAT CTTATCATGT CTGGATCGAT 3700 
CGGGAATTAA TTCGGCGCAG CACCATGGCC TGAAATAACC TCTGAAAGAG 3750 

50 

GAACTTGGTT AGGTACCTTC TGAGGCGGAA AGAACXAGCT GTGGAATGTG 3800 
55 TGTCAGTTAG GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT 3850 
GCAAAGCATG CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG 3900 

60 

GCTCCCCAGC AGGCAGAAGT ATGCAAAGCA TGCATCTCAA TTAGTCAGCA 3950 
ACCATAGTCC CGCCCCTAAC TCCGCCCATC CCGCCCCTAA CTCCGCCCAG 4000 

65 

TTCCGCCCAT TCTCCGCCCC ATGGCTGACT AATTTTTTTT ATTTATGCAG 4050 
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AGGCC6AG6C CGCCTCGGCC TCTGA6CTAT TCCAGAA6TA 6TC3AGGAGGC 4100 
TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG CTGTTAACAG CTTGGCACTG 4150 
GCCGTCGTTT TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAACT 4200 
TAATCXSCCTT GCAGCACATC CCCCCTTCGC CAGCTGGCGT AATAQCGAAG 4250 
AGGCCCGCAC CGATCGCCCT TCCCAACAGT TGCGTAGCCT GAATGGC6AA 4300 
TGGCGCCTGA TGCGGTATTT TCTCCTTACG CATCTGTGCG GTATTTCACA 4350 
CCGCATACGT CAAAGCAACC ATAGTACGCG CCCTGTAGCG GCGCATTAAG 4400 
CGCGGCGGGT GTGGTGGTTA CGCGCAGCGT GACCGCTACA CTTGCCAGCG 4450 
CCCTAGCGCC CGCTCCTTTC GCTTTCTTCC CTTCCTTTCT CGCCACGTTC 4500 
GCCGGCTTTC CCCGTCAAGC TCTAAATCGO GGGCTCCCTT TAGGGTTCCG 4550 
ATTTAGTGCT TTACGGCACC TCGACCCCAA AAAACTTGAT TTGGGTGATG 4600 
GTTCACGTAG TGGGCCATCG CCCTGATAQA CGGTTTTTCG CCCTTTGACG 4650 
TTGGAGTCCA CGTTCTTTAA TAGTGGACTC TTGTTCCAAA CTGGAACAAC 4700 
ACTCAACCCT ATCTCGGGCT ATTCTTTTGA TTTATAAGGG ATTTTGCCGA 4750 
TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA ATTTAACGCG 4800 
AATTTTAACA AAATATTAAC GTTTACAATT TTATGGTGCA CTCTCAGTAC 4850 
AATCTGCTCT GATGCCGCAT AGTTAAGCCA ACTCCGCTAT CGCTACGTGA 4900 
CTGGGTCATG GCTGCGCCCC GACACCCGCC AACACCCGCT GACGCGCCCT 4950 
GACGGGCTTG TCTGCTCCCG GCATCCGCTT ACAGACAAGC TGTGACCGTC 5000 
TCCGGGAGCT GCATGTGTCA GAGGTTTTCA CCGTCATCAC CGAAACGCGC 5050 
GAGGCAGTAT TCTTGAAGAC GAAAGGGCCT CGTGATACGC CTATTTTTAT 5100 
AGGTTAATGT CATGATAATA ATGGTTTCTT AGACGTCAGG TGGCACTTTT 5150 
CGGGGAAATG TGCGCGGAAC CCCTATTTGT TTATTTTTCT AAATACATTC 5200 
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AAATATGTAT CCGCTCATGA GACAATAACC CTGATAAATG CTTCAATAAT 5250 
ATTGAAAAAG GAAGAGTATG AGTATTCAAC ATTTCCGTGT CGCCCTTATT 5300 
CCCTTTTTTG CGGCATTTTG CCTTCCT6TT TTTGCTCACC CAGAAACX3CT 5350 
GGTGAAAGTA AAAGATGCTG AAGATCAGTT GGGTGCACGA GTGGGTTACA 5400 
TCGAACTGGA TCTCAACAGC GGTAAGATCC TTGAGAGTTT TCGCCCCGAA 5450 
QAACGTTTTC CAATOATGAG CACTTTTAAA GTTCTGCTAT GTGGCGCGGT 5500 
ATTATCCCGT GATGACGCCG GGCAAGAGCA ACTCX3GTCGC CGCATACACT 5550 
ATTCTCAGAA TGACTTGGTT GAGTACTCAC CAGTCACAGA AAAGCATCTT 5600 
ACGGATGGCA TGACAGTAAG AGAATTATGC AGTGCTGCCA TAACCATGAG 5650 
TGATAACACT GCGGCCAACT TACTTCTGAC AACGATCGGA GGACCGAAGG 5700 
AGCTAACCGC TTTTTTGCAC AACATGGGGG ATCATGTAAC TCX3CCTTGAT 5750 
CGTTGGGAAC CGGAGCTGAA TGAAGCCATA CCAAACGACX3 AGCGTGACAC 5800 
CACGATGCCA GCAGCAATGG CAACAACGTT GCGCAAACTA TTAACTGGCG 5850 
AACTACTTAC TCTAGCTTCC CGGCAACAAT TAATAGACTG GATGGAGGCG 5900 
GATAAAGTTG CAGGACCACT TCTGCGCTCG GCCCTTCCGG CTGGCTGGTT 5950 
TATTGCTGAT AAATCTGGAG CCGGT6A6CG TGGGTCTCGC GGTATCATTG 6000 
CAGCACTGGG GCCAGATGGT AAGCCCTCCC GTATCGTAGT TATCTACACG 6050 
ACX3GGGAGTC AGGCAACTAT GGATGAACGA AATAGACAGA TCGCTGAGAT 6100 
AGGTGCCTCA CTGATTAAGC ATTGGTAACT GTCAGACCAA GTTTACTCAT 6150 
ATATACTTTA GATTGATTTA AAACTTCATT TTTAATTTAA AAGGATCTAG 6200 
GTGAAGATCC TTTTTGATAA TCTCATGACC AAAATCCCTT AACGTGAGTT 6250 
TTCGTTCCAC TGAGCGTCAG ACCCCGTA6A AAAGATCAAA GGATCTTCTT 6300 
GAGATCCTTT TTTTCTGCGC GTAATCTGCT GCTTGCAAAC AAAAAAACCA 6350 
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CCGCTACCAO CGGTCGTTTG TTTGCCGGAT CAAGAGCTAC CAACTCTTTT 6400 



TCCGAAGGTA ACTGGCTTCA GCAGAGOSCA GATACCAAAT ACTGTCCTTC 6450 

5 

TAGTGTAGCC GTAGTTAGGC CACCACTTCA AGAACTCTGT AGCACCGCCT 6500 



10 ACATACCTCG CTCTGCTAAT CCTGTTACCA GTGGCTGCTG CCAOTGGCGA 6550 
TAAGTCGTGT CTTACCXX;GT TGGACTCAAG ACGATAGTTA CCGGATAAGG 6600 

15 

CGCA6CGGTC GGGCTGAACG GGGGGTTCGT GCACACAGCC CAGCTTGGAG 6650 
CGAACGACCT ACACCGAACT GAGATACCTA CAGCGTGAGC ATTGAGAAAG 6700 

20 

CGCCACGCTT CCCGAAGGGA GAAAGGCGGA CAGGTATCCX3 GTAAGCGGCA 6750 



25 GGGTCGGAAC AGGAGAGCGC ACGAGGGAGC TTCCAGGGGG AAACGCCTGG 6800 
TATCTTTATA GTCCTGTCXW GTTTCGCCAC CTCTGACTTG AGCGTCGATT 6850 

30 

TTTGTGATGC TCGTCAGGGG GGCGGAGCCT ATGGAAAAAC GCCAGCAACG 6900 
CX3GCCTTTTT ACX3GTTCCTG OCCTTTTGCT GGCCTTTTGC TCACATGTTC 6950 

35 

TTTCCTGCGT TATCCCCTGA TTCTGTGGAT AACCGTATTA CCGCCTTTGA 7000 



40 GTGAGCTGAT ACX:GCTCGCC GCAGCCX3AAC GACCGAGCGC AGCXSAGTCAG 7050 
TGAGCGAGGA AGCGGAAGAG CGCCCAATAC GCAAACCGCC TCTCCCCGCG 7100 

45 

CGTTGGCCGA TTCATTAATC CAGCTGGCAC GACAGGTTTC CCGACTGGAA 7150 
AGCGGGCAGT GAGCGCAACG CAATTAATGT GAGTTACCTC ACTCATTAGG 7200 

50 

CACCCCAGGC TTTACACTTT ATGCTTCCGG CTCGTATGTT GTGTGGAATT 7250 



55 GTGAGCGGAT AACAATTTCA CACAGGAAAC AGCTATGACC ATGATTACGA 7300 



ATTAA 7305 

60 
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CLAIMS 

1. A DNA construct comprising a transcriptional initiation site, a 
transcriptional termination site, a selectable gene, a product gene 

5 provided 3' to the selectable gene, a transcriptional regulatory 

region regulating transcription of both the selectable gene and the 
product gene, the selectable gene being positioned within an intron 
having a splice donor site 5' of the intron, which splice donor site 
regulates expression of the product gene using the transcriptional 
10 regulatory region. 

2. The DNA construct of claim 1 wherein the splice donor site comprises 
an efficient splice donor sequence. 

15 3. The DNA construct of claim 2 wherein the splice donor site comprises 
a consensus splice donor sequence. 

4. The DNA construct of claim 2 wherein the splice donor site con^rises 
the sequence GAC6TAA6T. 

20 

5. The DNA construct of claim 1 wherein the selectable gene is an 
an^lif iable gene. 

6. The DNA construct of claim 5 wherein the an^lif iable gene is DHFR. 

25 

7. The DNA construct of claim 1 wherein the transcriptional regulatory 
region con^rises a promoter and an enhauicer. 

8. A vector comprising the DNA construct of claim l. 

30 

9. The vector of claim 8 wherein the selectable gene of the DNA 
construct is an araplif iable gene. 

10. The vector of claim 8 that is capable of replication in a eukaryotic 
35 host. 

11. A eukaryotic host cell comprising the vector of claim 10. 

12. A eukaryotic host cell comprising the DNA construct of claim 5. 

40 

13. The host cell of claim 11 wherein the vector is introduced into the 
host cell by electroporation. 

14. A eukaryotic host cell comprising the DNA construct of claim 1 
45 integrated into a chromosome of the host cell . 
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15. The host cell of claim 14 that is a mammalian cell. 

16. A method for producing a product of interest comprising culturing the 
host cell of claim 11 so as to express the product gene and 

5 recovering the product from the host cell culture . 

17. The method of claim 16 further comprising recovering the product from 
the culture medium. 

10 18. The method of claim 16 wherein the selectable gene is an anqplifiable 
gene and the splice donor site comprises an efficient splice donor 
sequence . 

19. A method for producing a product of interest comprising culturing the 
15 host cell of claim 12 so as to express the product gene in a 

selective medium comprising an amplifying agent for sufficient time 
to allow amplification to occur, and recovering the product. 

20. A method for producing exikaryotic cells having multiple copies of a 
20 product gene comprising transforming eukaryotic cells with the DNA 

construct of claim 5, growing the cells in a selective medixim 
cottprising an anq^lifying agent for a sufficient time for 
amplification to occur, euid selecting cells having multiple copies 
of the product gene. 

25 

21. The method of claim 20 further conqprising recovering from the 
selected cells the product of interest. 
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